CN115510869A - End-to-end Tibetan La lattice shallow semantic analysis method - Google Patents

End-to-end Tibetan La lattice shallow semantic analysis method Download PDF

Info

Publication number
CN115510869A
CN115510869A CN202210602138.3A CN202210602138A CN115510869A CN 115510869 A CN115510869 A CN 115510869A CN 202210602138 A CN202210602138 A CN 202210602138A CN 115510869 A CN115510869 A CN 115510869A
Authority
CN
China
Prior art keywords
semantic
lattice
lstm
tibetan
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210602138.3A
Other languages
Chinese (zh)
Other versions
CN115510869B (en
Inventor
班玛宝
才让加
张瑞
慈祯嘉措
桑杰端珠
杨毛加
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qinghai Normal University
Original Assignee
Qinghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qinghai Normal University filed Critical Qinghai Normal University
Priority to CN202210602138.3A priority Critical patent/CN115510869B/en
Publication of CN115510869A publication Critical patent/CN115510869A/en
Application granted granted Critical
Publication of CN115510869B publication Critical patent/CN115510869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of Tibetan La lattice shallow semantic analysis, in particular to an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps: 1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector; 2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the timing sequence characteristics and the context semantic information of the input sentence are learned by adopting the BilSTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate unobstructed between different layers; 3. calculating local normalized distribution of the semantic label at each moment by using softmax, so as to be used for constraint decoding by an output layer; 4. and when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the set BIO and La lattice shallow semantic labeling constraints. The method can better perform the Tibetan La grid shallow semantic analysis.

Description

End-to-end Tibetan La lattice shallow semantic analysis method
Technical Field
The invention relates to the technical field of Tibetan La lattice shallow semantic analysis, in particular to an end-to-end Tibetan La lattice shallow semantic analysis method.
Background
The aim of the La lattice shallow semantic analysis of the Tibetan language is to find out a predicate of a given sentence, determine related main semantic components by taking the predicate as a core, and mark corresponding semantic labels, so as to obtain the type of the given sentence and semantic roles played by the semantic components.
La lattice is the key point and difficulty in Tibetan language grammar book thirty song and pronunciation theory, and is also eight lattices
Figure BDA0003669798890000011
The main research content in the method is oriented to natural language processing, and the Tibetan language La superficial layer semantic analysis technology based on a machine learning method can help a plurality of upper Tibetan language natural language understanding tasks, such as semantic role labeling, semantic analysis, information extraction, automatic question answering, reading understanding, machine translation and the like. In addition, la lattice is a key knowledge that must be learned in the textbook of tibetan language, and only by skillfully mastering its concept and usage, the types of the tibetan language La can be accurately distinguished, the main semantic components of each sentence can be found, and the actual meaning of each sentence can be further analyzed. Therefore, the Tibetan La lattice shallow semantic analysis technology based on the machine learning method has certain practical application value in the learning of La lattices.
The traditional shallow semantic analysis task is closely related to syntactic analysis and depends heavily on the syntactic analysis result, which increases the complexity of the shallow semantic analysis. In recent years, with the continuous maturation of deep learning technology, an end-to-end model without syntactic input obtains good results on a shallow semantic analysis task. The document researches an end-to-end LSTM shallow semantic analysis method, obtains an effect superior to that of the traditional method for introducing syntactic information, successfully discloses the potential capability of LSTM in processing the potential syntactic structure of a sentence, and provides a theoretical basis and a reference basis for further research and improvement. At present, a Tibetan shallow semantic analysis method based on an end-to-end model is not found, and a literature report about La lattice shallow semantic analysis is not seen.
Disclosure of Invention
The invention provides an end-to-end Tibetan language La lattice shallow semantic analysis method which can overcome some or some defects in the prior art.
The invention discloses an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps of:
1. mapping the input characteristic sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-value vector;
2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the temporal characteristics and the context semantic information of the input sentence are learned by adopting the BilTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate smoothly between different layers;
3. calculating local normalized distribution of the semantic tags at each moment by using softmax so as to be used for constraint decoding by an output layer;
4. and when decoding is carried out by using a Viterbi algorithm, the structural relationship between the output semantic labels is standardized by forcibly executing the set BIO and La grid shallow semantic labeling constraint.
Preferably, in step one, the method comprises
Figure BDA0003669798890000021
Represents the trained GloVe word vector, represents the vocabulary by V,
Figure BDA0003669798890000022
representing the set of tokens by C ∈ {0,1}, the most primitive input sequence { w 1 ,w 2 ,…,w T And the marker sequence m 1 ,m 2 ,…,m T Mapping into low-dimensional real value vector e (w) by lookup table t ) And e (m) t ) Wherein w is t E V and corresponding mark m t E is C; to this end, the vector e (w) may be divided t ) And e (m) t ) Are spliced into x l,t As a first layer of the LSTMEntering:
x l,t =[e(w t ),e(m t )]
wherein x is l,t Is the input to LSTM at time t of the ith layer, where l =1,t = [1,t =]。
Preferably, in the second step, the first LSTM is used for forward processing of the input sentences, and then the output of the layer is used as the input of the next layer for reverse processing, so that a foundation is laid for improving the learning capability of time sequence characteristics and fully acquiring context semantic information at each moment; LSTM is defined as follows:
Figure BDA0003669798890000023
Figure BDA0003669798890000024
Figure BDA0003669798890000025
Figure BDA0003669798890000031
wherein, delta l Represents the direction of the l-th layer LSTM when delta l LSTM direction is positive when = -1, when delta l Direction is reverse when = 1;
to stack LSTM in interleaved mode, the inputs x of a particular layer are arranged in the following manner l,t And a direction parameter delta l
Figure BDA0003669798890000032
Input vector x l,t Is a character w t Word embedding and representation of w t Is an embedded concatenation of binary features (t = v) of a given predicate.
As a preferenceIn the second step, linear and nonlinear transformation weights between layers are controlled by means of GM in the vertical direction of LSTM, which is used for balancing the information transfer in the vertical direction; by λ l,t Gating device for representing GM, using output h of hidden layer after GM l,t The following can be changed:
Figure BDA0003669798890000033
h l,t =LSTM(h l-1,t ,h l,t-1 )。
preferably, in step two, to reduce overfitting, a rate of conjugate Dropout is used, with a shared Dropout mask D l Application to hidden states:
Figure BDA0003669798890000034
inputting a characteristic sequence x = { w) of La lattice sentences 1 ,w 2 ,…,w n H, the corresponding correct semantic tag sequence y = { y = 1 ,y 2 ,…,y n The log-likelihood of is:
Figure BDA0003669798890000035
preferably, in step three, the hidden state h according to the model l,t Output semantic tag y can be computed using softmax t Local normalized distribution of (c):
Figure BDA0003669798890000036
w in the above formula o Is a parameter matrix for softmax,
Figure BDA0003669798890000037
is Kronecker delta, and the dimension is consistent with the number of semantic labels; the model training objective is givenThe probability of the correct label is maximized when entered.
The invention firstly uses the design idea of the LSTM for reference, and balances the information transmission in the vertical direction by arranging a gate control high-speed connection mechanism (GM) in the vertical direction of the LSTM. With GM, information can propagate more smoothly in the spatial and temporal dimensions with only minor information loss. Most importantly, GM contains a "gating" function that dynamically selects or ignores the vertical propagation of information so that different levels of abstract representation can be more easily passed to the output layer. Then using softmax, the local normalized distribution of semantic tags at each moment is calculated for constraint decoding by the output layer. And finally, when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the BIO and La lattice shallow semantic labeling constraint set by the text, so that the accuracy of the final predicted semantic labels is improved.
Drawings
FIG. 1 is a schematic diagram of a shallow semantic analysis model architecture of La lattice in an embodiment;
FIG. 2 is a graph showing the difference between GM and LSTM in the examples;
FIG. 3 is a graph showing the effect of GM in the examples on experimental performance;
FIG. 4 is a diagram illustrating the effect of constrained decoding on experimental performance in an example embodiment;
FIG. 5 is a diagram illustrating an influence of a timing feature learning manner on experimental performance in an embodiment.
Detailed Description
For a further understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings and examples. It is to be understood that the examples are illustrative of the invention and not limiting.
Examples
The embodiment provides an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps of:
1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector;
2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the temporal characteristics and the context semantic information of the input sentence are learned by adopting the BilTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate smoothly between different layers;
3. calculating local normalized distribution of the semantic label at each moment by using softmax, so as to be used for constraint decoding by an output layer;
4. and when decoding is carried out by using a Viterbi algorithm, the structural relationship between the output semantic labels is standardized by forcibly executing the set BIO and La grid shallow semantic labeling constraint.
Because the Semantic Role Labeling (SRL) is a shallow semantic representation, which has the advantages of simplicity and easy use, multiple language applications, and deeper model and algorithm research, and the target of the shallow semantic analysis of the La lattice of the tibetan is similar to that of the semantic role labeling task, the model architecture shown in fig. 1 is designed by referring to and using the representative research in the semantic role labeling task, i.e. based on the work based on the Bi deep-LSTM and Self-annotation framework, and mainly comprises the following parts:
(1) Embedding layer: mapping the input characteristic sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-value vector;
(2) LSTM layer: aiming at improving the learning capability of model time sequence characteristics and enhancing the semantic space expression capability of the model, the time sequence characteristics and context semantic information of an input sentence are learned by using BilSTM;
(3) Gating the high-speed link: to alleviate the problem of gradient disappearance when training BiLSTM, this embodiment uses GM, which controls the weights of the linear and nonlinear transformations between layers;
(4) Softmax layer: mapping semantic labels possibly output at each moment into a local normalized distribution of (0, 1) by using a Softmax function so as to carry out constraint decoding on an output layer;
(5) Constraint decoding layer: in order to constrain the structural relationship between the output semantic tags during decoding, the embodiment enforces the BIO and La lattice shallow semantic labeling constraints set by the embodiment during decoding by using the viterbi algorithm.
Embedding layer
In the task of shallow semantic analysis of the Tibetan La lattice, the method uses
Figure BDA0003669798890000051
Represents the trained GloVe word vector, represents the vocabulary by V,
Figure BDA0003669798890000052
by using
Figure BDA0003669798890000053
Representing a set of tokens, the most primitive input sequence w 1 ,w 2 ,…,w T And the marker sequence m 1 ,m 2 ,…,m T Mapping the lookup table into a low-dimensional real-valued vector e (w) t ) And e (m) t ) Wherein w is t E V and corresponding mark m t E is C; to this end, the vector e (w) may be divided t ) And e (m) t ) Are spliced into x l,t As inputs to the LSTM first layer:
x l,t =[e(w t ),e(m t )]
wherein x is l,t Is the input to LSTM at time t of the ith layer, where l =1,t = [1,t =]。
Bidirectional LSTM
This embodiment converts the shallow semantic analysis of the Tibetan La lattice into an end-to-end sequence labeling task. The sequence labeling task has higher dependence on the learning capacity of the time sequence characteristics and the text context information, so that the first LSTM is used for forward processing of the input sentence, the output of the layer is used as the input of the next layer for reverse processing, and a foundation is laid for improving the learning capacity of the time sequence characteristics and fully acquiring the context semantic information of each moment; LSTM is defined as follows:
Figure BDA0003669798890000061
Figure BDA0003669798890000062
Figure BDA0003669798890000063
Figure BDA0003669798890000064
wherein, delta l Represents the direction of the l-th layer LSTM when delta l LSTM direction is positive when = -1, when delta l Direction is reverse when = 1;
to stack LSTM in interleaved mode, the inputs x of a particular layer are arranged in the following manner l,t And a direction parameter delta l
Figure BDA0003669798890000065
Input vector x l,t Is a character w t Word embedding and representation of w t Is an embedded concatenation of the binary features (t = v) of a given predicate.
GM based on LSTM
In order to alleviate the problem of gradient disappearance when training BiLSTM, linear and nonlinear transformation weights between layers are controlled by means of GM in the vertical direction of the LSTM, which acts to balance the transfer of information in the vertical direction; by λ l,t Gating device for representing GM, using output h of hidden layer after GM l,t The following steps can be changed:
Figure BDA0003669798890000071
h′ l,t =LSTM(h l-1,t ,h l,t-1 )。
for clarity, a difference plot of GM based LSTM and normal LSTM is given, see fig. 2 for details.
In FIG. 2, h l-1,t The output representing the previous layer is also the input of the current layer. h' l,t Representing candidate outputs, also the output of the LSTM. GM to h l-1,t And h' l,t A linear connection is made which plays a great role in the high-speed transfer of information in the vertical direction, λ l,t It is decided how much information of the next layer can be transferred to the previous layer. During the training process, λ l,t The closer to 1, the more information is transferred to the next layer, when λ l,t When =1, the input is directly copied to the output without any change, and the information of the bottom layer can be more smoothly transferred to the top layer through the GM mechanism. On the contrary, λ l,t Closer to 0 means less information is transferred to the next layer when λ l,t =0, the GM degrades to the traditional LSTM. Since GM occurs inside neurons, the transfer of underlying information in the temporal direction is not affected.
To reduce overfitting, the rate of conjugated prediction Dropout is used, with a shared Dropout mask D l Application to the hidden state:
Figure BDA0003669798890000072
inputting a characteristic sequence x = { w) of La lattice sentences 1 ,w 2 ,…,w n }, the corresponding correct semantic tag sequence y = { y = 1 ,y 2 ,…,y n The log-likelihood of is:
Figure BDA0003669798890000073
softmax layer
Hidden state according to model h l,t Output semantic tag y can be computed using softmax t Local normalized distribution of (c):
Figure BDA0003669798890000074
w in the above formula o Is a softmax parameterA matrix of numbers is formed by a matrix of numbers,
Figure BDA0003669798890000075
is Kronecker delta, and the dimension is consistent with the number of semantic labels; the goal of model training is to maximize the probability of a correct label given the input.
Constraint decoding layer
In order to combine constraints on an output structure during decoding, this embodiment sets the shallow semantic labeling constraints of BIO and La lattice according to the BIO sequence labeling method and the La lattice shallow semantic label specification. And finally, the two constraints are enforced when decoding is carried out by using a Viterbi algorithm, and the constraints are exemplified as follows:
(1) BIO constraint
BIO is a commonly used sequence labeling method in NLP, B denotes the beginning of a labeled fragment, I denotes the middle or end of the labeled fragment, and O denotes others. It constrains to reject any sequence that does not produce a valid BIO transition, e.g., B-A0 followed by I-A1, etc.
(2) La lattice shallow semantic annotation constraint
Unique semantic tags: semantic tags A0, A1, A2 and the proprietary tags in Table 1 can only appear once at most for each La lattice sentence pattern;
restricted semantic tags: rejecting any proprietary tags in Table 1 from cross-appearing in different sentence patterns, e.g. AM-Bas followed by AM-L 2 And so on.
Sequential semantic tags: rejecting any proprietary tag AM-L in Table 1 i Where the sequence appears before another proprietary tag, e.g. AM-L 1 Followed by AM-Bas, etc.
Continuation semantic tags: continuation semantic tags may exist only if their base semantic tag is implemented before it, e.g., B-A0 immediately follows I-A0, etc.
TABLE 1 bounding Table of proprietary and common semantic tags
Figure BDA0003669798890000081
Due to the deficiency in the La lattice
Figure BDA0003669798890000082
In five usages of prime, eagle, isomate and hour, the rest of the fictitious words can be replaced randomly in principle
Figure BDA0003669798890000083
And
Figure BDA0003669798890000084
therefore they are collectively called La lattice
Figure BDA0003669798890000085
The La lattice includes two kinds of free and non-free null words, wherein
Figure BDA0003669798890000086
For the non-free virtual words, the additional word is limited to the postaddition word of the previous syllable,
Figure BDA0003669798890000091
and
Figure BDA0003669798890000092
the addition is not limited by the previous syllable for free null words.
According to the semantic function, the interpolation rule and different usages of the La lattice virtual words, the usage of the La lattice virtual words can be divided into five types of sentence patterns of table operation lattices, eggers, isomorphism and time lattices:
business grid sentence: sentences that represent actions that have been or are being performed and actions to be performed at a certain place of implementation [9] . One characteristic of a sentence is that there is an implementation place in the sentence
Figure BDA0003669798890000093
To perform ground movement
Figure BDA0003669798890000094
And La lattice particle
Figure BDA0003669798890000095
Three main semantic components, one of which is absent. Such as "
Figure BDA0003669798890000096
Figure BDA0003669798890000097
(in library learning)', the place where the action is performed is "
Figure BDA0003669798890000098
(bookstore) ", the action of implementation is"
Figure BDA0003669798890000099
(learning)' and La lattice syllabary are
Figure BDA00036697988900000910
If the machine can correctly recognize and distinguish these semantic components, the semantics can be basically correctly understood.
As a lattice sentence: a sentence representing an action to be performed for a certain purpose. One characteristic of a grid sentence is that there is a purpose in the sentence
Figure BDA00036697988900000911
Performing ground-based actions for a certain purpose
Figure BDA00036697988900000912
Figure BDA00036697988900000913
And La lattice particle
Figure BDA00036697988900000914
Three main semantic components, one of which is missing. Such as "
Figure BDA00036697988900000915
(efforts to gain knowledge) ', the purpose is'
Figure BDA00036697988900000916
(to know that"recognition", the action performed to achieve the goal is "
Figure BDA00036697988900000917
(effort) ", la lattice abbreviation is
Figure BDA00036697988900000918
If the machine can correctly recognize and distinguish these semantic components, the semantics can be understood correctly basically.
According to the lattice sentence: the sentence which represents that something depends on a certain place added with La lattice word. The example sentence has two main characteristics, one is that there is some place where La lattice word is added and connected in the sentence
Figure BDA00036697988900000919
A dependent object
Figure BDA00036697988900000920
And La lattice particle
Figure BDA00036697988900000921
Three main semantic components. Such as "
Figure BDA00036697988900000922
Figure BDA00036697988900000923
(students in classroom are many)' where "
Figure BDA00036697988900000924
(classroom) ", the storage is"
Figure BDA00036697988900000925
(student) ", la lattice virtual words are
Figure BDA00036697988900000926
If the machine can correctly identify and distinguish the semantic components, the semantic meaning of the machine can be basically correctly understood; secondly, when the predicate in the sentence pattern is the existence of the assistant word
Figure BDA00036697988900000927
And
Figure BDA00036697988900000928
when equal, there may be no component of the dependency in the first feature, e.g.
Figure BDA00036697988900000929
In, the place is "
Figure BDA00036697988900000930
(classroom), the existence of the auxiliary word is "
Figure BDA00036697988900000931
(there) ", la lattice abbreviation is
Figure BDA00036697988900000932
If the machine can correctly recognize and distinguish the semantic components, the machine can also basically correctly understand the semantics.
And (4) sentence with same lattice: a sentence that represents that something changes to another thing, so that both have the property of "unity" in the result of the change in the action behavior. The same example sentence has two main characteristics, one is that there is a certain event in the sentence
Figure BDA0003669798890000101
Another thing to change to
Figure BDA0003669798890000102
And La lattice particle
Figure BDA0003669798890000103
Figure BDA0003669798890000104
Three main semantic components. Such as "
Figure BDA0003669798890000105
(translation of Chinese into Tibetan)', a certain thing is "
Figure BDA0003669798890000106
(Han dynasty) ", another thing that became is"
Figure BDA0003669798890000107
(Tibetan)', la lattice dotted word is
Figure BDA0003669798890000108
If the machine can correctly identify and distinguish the semantic components, the semantics of the machine can be basically correctly understood; second, there is a result of action behavior change in the sentence
Figure BDA0003669798890000109
Act of action
Figure BDA00036697988900001010
And La lattice particle
Figure BDA00036697988900001011
Three main semantic components. Such as "
Figure BDA00036697988900001012
(let him happy) "the result of the change in action behavior is"
Figure BDA00036697988900001013
(Tibetan) ', action behavior is'
Figure BDA00036697988900001014
(let) ", la lattice syllabic notation is
Figure BDA00036697988900001015
If the machine can correctly identify and distinguish these semantic components, the machine can also basically correctly understand the semantics thereof.
The time frame sentence: a sentence representing the time when a certain action behavior is implemented. The sentence pattern has two main characteristics, one is that there is time for implementing a certain action in the sentence
Figure BDA00036697988900001016
Implement a certain action behavior
Figure BDA00036697988900001017
And La lattice particle
Figure BDA00036697988900001018
Three main semantic components, e.g.) "
Figure BDA00036697988900001019
(three years of research) ', the time to perform action is'
Figure BDA00036697988900001020
(three years) ", the action performed is"
Figure BDA00036697988900001021
Figure BDA00036697988900001022
(study) ", la lattice particle is
Figure BDA00036697988900001030
If the machine can correctly identify and distinguish the semantic components, the semantics of the machine can be basically correctly understood; second, only the time of implementing a certain action in the sentence
Figure BDA00036697988900001023
Figure BDA00036697988900001024
And La lattice particle
Figure BDA00036697988900001025
Two semantic components, one action behavior without enforcement
Figure BDA00036697988900001026
The phenomenon of this semantic component is more common, e.g.) "
Figure BDA00036697988900001027
(three years of He)', the time to perform an action is "
Figure BDA00036697988900001028
(three years) ", la lattice virtual words are
Figure BDA00036697988900001029
If the machine can correctly identify and distinguish these semantic components, the machine can also basically correctly understand the semantics thereof.
Experiment of
Since there is no published Tibetan La lattice shallow semantic analysis dataset, in this embodiment, 2 ten thousand sentences containing only one La lattice particle word are first extracted from the Tibetan corpus constructed by the laboratory topic group. Then 12000 sentences are selected out for marking La grid shallow semantic by preprocessing the sentences. And finally, according to the La lattice shallow semantic mark specification formulated by the embodiment, the construction of the Tibetan La lattice shallow semantic analysis data set is completed in a manual marking mode, and the TLSD is called as the TLSD for short for convenience in later use. In the experiment, the TLSD data set was divided into a training set, a validation set, and a test set in a ratio of 8.
In the experimental process, in order to ensure the comparability of experimental results, parameter adjusting ranges of hyper-parameters of all models are limited, and finally, the current optimal hyper-parameter combination is selected in the limited range after multiple parameter adjustments, wherein the model parameters are detailed in a table 2.
TABLE 2 model parameters
Figure BDA0003669798890000111
Baseline method
Since the literature about Tibetan La superficial semantic analysis is not referred to at present, and a Tibetan La superficial semantic analysis data set which is not disclosed is added, the effect of the model of the embodiment cannot be verified by directly comparing with the work of predecessors.
Based on the above reasons, we will select several methods in the references used in building the model of the present embodiment as baseline models to verify the effect of the model of the present embodiment.
(1) LSTM + CRF: the semantic role labeling method based on the deep bidirectional RNN is disclosed.
(2) DBLSTM: the method is a semantic role labeling method based on the depth bidirectional LSTM.
(3) Self-orientation: the method is a semantic role labeling method based on a self-attention mechanism.
(4) End-to-End: the method is a span-based end-to-end semantic role labeling method.
(5) BilSTM + GM + CD and BilSTM + GM + CD (V) are models of the embodiment, which respectively refer to a Tibetan La lattice shallow semantic analysis model when predicates in sentences need to be predicted by the models and given in advance.
Evaluation index
In the embodiment, the model performance is evaluated by selecting the commonly used evaluation index Accuracy (ACC) in the sequence labeling task. Let TP denote positive samples predicted to be positive, FP denote negative samples predicted to be positive, FN denote positive samples predicted to be negative, TN denote negative samples predicted to be negative, the Accuracy (ACC) is calculated as:
Figure BDA0003669798890000112
results and analysis of the experiments
Comparison of model Performance on TLSD datasets
In order to verify the effectiveness and superiority of the model of the embodiment, the Tibetan language La lattice shallow semantic analysis performance of the baseline model and the model of the embodiment is compared, and the experimental comparison result of each model is shown in Table 3.
TABLE 3 experimental comparison of the models
Figure BDA0003669798890000121
As can be seen from the experimental results in Table 3, the performance of the model of the present embodiment is improved compared with that of several baseline models. When the model predicts predicates by itself, the analysis accuracy of the La lattice shallow semantic of the LSTM + CRF, DBLSTM, self-orientation and End-to-End on the test set is respectively improved by 3.33, 3.01, 1.58 and 1.87 percentage points, which shows that the analysis performance of the La lattice shallow semantic of the Tibetan language of the model of the embodiment is better. In addition, the model of the embodiment has a good effect when jointly predicting the predicates and other corresponding semantic labels, the analysis accuracy of the La lattice shallow semantic is only 0.93% lower than that of the predicates given in advance on the test set, and the superiority of the model of the embodiment is verified.
The Tibetan La lattice shallow semantic analysis task has good effects on several baseline models and the model of the embodiment, and the main reasons are three, namely, all sentences in the TLSD data set are Tibetan single sentences only containing one La lattice particle word; secondly, the sentences in the TLSD data set are not long and the length is between 4 and 30 words; and thirdly, compared with the semantic role labeling task, the shallow semantic label of the La lattice is easy to identify and label. In addition, the second reason why the model of the present embodiment has better performance relative to the baseline model is that the transmission of information in the vertical direction is balanced by the device GM, and the problem of gradient disappearance is further alleviated; and secondly, a more reasonable output semantic tag structure is obtained by constraining the output structure during decoding.
Validation of GM
In order to verify the validity of GM, the La lattice shallow semantic analysis effect of the model when using the BilSTM of GM and when using the ordinary BilSTM was examined respectively. The first mode is a mode using ordinary BilSTM, the second mode is a mode using GM BilSTM, and the experimental results on the test set are shown in FIG. 3.
As can be seen from FIG. 3, the accuracy when using GM BilSTM was 1.06% higher than when using ordinary BilSTM, confirming the effectiveness of GM.
Validity verification of constrained decoding method
In order to verify the effectiveness of the constraint decoding method, the La lattice shallow semantic analysis effect of the model when constraint decoding is used and not used is examined respectively. The first method is a method without using constraint decoding, the second method is a method using constraint decoding, and the experimental results on the test set are shown in fig. 4.
As can be seen from FIG. 4, the La lattice shallow semantic analysis accuracy of the model is 0.76% higher when constraint decoding is used than when constraint decoding is not used, and the effectiveness of the constraint decoding method is verified.
Model performance impact of timing characteristic learning mode
In order to examine the influence of the time sequence characteristic learning mode on the model performance, the La lattice shallow semantic analysis effects of the model when LSTM and BilSTM are used are respectively compared. The first method is a method for learning the temporal characteristics by using LSTM, and the second method is a method for learning the temporal characteristics by using BiLSTM, and the experimental results on the test set are shown in fig. 5.
As can be seen from FIG. 5, the La lattice shallow semantic analysis accuracy of the model is 2.57% higher when using BilSTM to learn the temporal features than when using LSTM to learn the temporal features, which indicates that the model has better performance when using BilSTM to learn the temporal features.
Final phrase
In order to alleviate the problem of gradient disappearance and balance the transfer of information in the vertical direction, the embodiment mixes linear and nonlinear information by arranging a GM in the vertical direction of an LSTM, so that the information is more smoothly spread in the spatial and temporal dimensions. In order to standardize the structural relationship between the output semantic tags and improve the accuracy of the predicted semantic tags, the BIO and La lattice shallow semantic labeling constraints set by the embodiment are enforced when decoding is performed by using the Viterbi algorithm. The experimental result shows that on the test set, the La lattice shallow semantic analysis accuracy of the method reaches 90.59%, and the performance is superior to that of a plurality of baseline models.
The present invention and its embodiments have been described above schematically, and the description is not intended to be limiting, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims (6)

1. An end-to-end Tibetan La lattice shallow semantic analysis method is characterized by comprising the following steps: the method comprises the following steps:
1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector;
2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the timing sequence characteristics and the context semantic information of the input sentence are learned by adopting the BilSTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate unobstructed between different layers;
3. calculating local normalized distribution of the semantic tags at each moment by using softmax so as to be used for constraint decoding by an output layer;
4. and when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the set BIO and La lattice shallow semantic labeling constraints.
2. The end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 1, wherein: in the first step, use
Figure FDA0003669798880000014
Represents the trained GloVe word vector, represents the vocabulary by V,
Figure FDA0003669798880000015
representing the set of tokens by C ∈ {0,1}, the most primitive input sequence { w 1 ,w 2 ,…,w T H and a marker sequence m 1 ,m 2 ,…,m T Mapping the lookup table into a low-dimensional real-valued vector e (w) t ) And e (m) t ) Wherein w is t e.V and corresponding mark m t Belongs to C; to this end, the vector e (w) may be divided t ) And e (m) t ) Are spliced into x l,t As inputs to the LSTM first layer:
x l,t =[e(w t ),e(m t )]
wherein x is l,t Is the input to LSTM at time instant l layer t, where l =1,t = [1,t =]。
3. The end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 1, wherein: in the second step, the first LSTM is used for forward processing of input sentences, and then the output of the layer is used as the input of the next layer for reverse processing, so that a foundation is laid for improving the learning capability of time sequence characteristics and fully acquiring context semantic information at each moment; LSTM is defined as follows:
Figure FDA0003669798880000011
Figure FDA0003669798880000012
Figure FDA0003669798880000013
Figure FDA0003669798880000021
wherein, delta l Represents the direction of the l-th layer LSTM when delta l LSTM direction is positive when =1, when δ l Direction is reverse when = 1;
to stack LSTM in interleaved mode, the inputs x of a particular layer are arranged in the following manner l,t And a direction parameter delta l
Figure FDA0003669798880000022
Input vector x l,t Is a character w t Word embedding and table ofShow w t Is an embedded concatenation of the binary features (t = v) of a given predicate.
4. The end-to-end Tibetan La lattice shallow semantic analysis method according to claim 3, characterized in that: in the second step, linear and nonlinear transformation weights between layers are controlled by means of GM in the vertical direction of LSTM, which is used for balancing the information transfer in the vertical direction; by λ l,t Gating device for GM, using output h of post-GM hidden layer l,t The following can be changed:
Figure FDA0003669798880000023
h′ l,t =LSTM(h l-1,t ,h l,t-1 )。
5. the end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 4, wherein: in step two, to reduce overfitting, the rate of conjugate Dropout is used, with a shared Dropout mask D l Application to the hidden state:
Figure FDA0003669798880000024
inputting a characteristic sequence x = { w) of La lattice sentences 1 ,w 2 ,…,w n }, the corresponding correct semantic tag sequence y = { y = 1 ,y 2 ,…,y n The log-likelihood of is:
Figure FDA0003669798880000025
6. the end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 1, wherein: in step three, according to the hidden state h of the model l,t Output semantic tag y can be computed using softmax t Local normalized distribution of (c):
Figure FDA0003669798880000031
w in the above formula o Is a parameter matrix for softmax,
Figure FDA0003669798880000032
is Kronecker delta, and the dimension is consistent with the number of semantic labels; the goal of model training is to maximize the probability of a correct label given the input.
CN202210602138.3A 2022-05-30 2022-05-30 End-to-end Tibetan Lager shallow semantic analysis method Active CN115510869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210602138.3A CN115510869B (en) 2022-05-30 2022-05-30 End-to-end Tibetan Lager shallow semantic analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210602138.3A CN115510869B (en) 2022-05-30 2022-05-30 End-to-end Tibetan Lager shallow semantic analysis method

Publications (2)

Publication Number Publication Date
CN115510869A true CN115510869A (en) 2022-12-23
CN115510869B CN115510869B (en) 2023-08-01

Family

ID=84500637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210602138.3A Active CN115510869B (en) 2022-05-30 2022-05-30 End-to-end Tibetan Lager shallow semantic analysis method

Country Status (1)

Country Link
CN (1) CN115510869B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440236A (en) * 2013-09-16 2013-12-11 中央民族大学 United labeling method for syntax of Tibet language and semantic roles
CN109408812A (en) * 2018-09-30 2019-03-01 北京工业大学 A method of the sequence labelling joint based on attention mechanism extracts entity relationship
CN111062210A (en) * 2019-12-25 2020-04-24 贵州大学 Neural network-based predicate center word identification method
CN114239574A (en) * 2021-12-20 2022-03-25 淄博矿业集团有限责任公司 Miner violation knowledge extraction method based on entity and relationship joint learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440236A (en) * 2013-09-16 2013-12-11 中央民族大学 United labeling method for syntax of Tibet language and semantic roles
CN109408812A (en) * 2018-09-30 2019-03-01 北京工业大学 A method of the sequence labelling joint based on attention mechanism extracts entity relationship
CN111062210A (en) * 2019-12-25 2020-04-24 贵州大学 Neural network-based predicate center word identification method
CN114239574A (en) * 2021-12-20 2022-03-25 淄博矿业集团有限责任公司 Miner violation knowledge extraction method based on entity and relationship joint learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUHENG HE等: "Deep Semantic Role Labeling: WhatWorks and What’s Next", PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, pages 473 - 483 *
班玛宝: "融合双通道音节特征的藏文 La 格例句自动分类模型", 北京大学学报(自然科学版), vol. 58, no. 1, pages 91 - 98 *

Also Published As

Publication number Publication date
CN115510869B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
Roy et al. Unit dependency graph and its application to arithmetic word problem solving
CN112270193A (en) Chinese named entity identification method based on BERT-FLAT
CN111125331A (en) Semantic recognition method and device, electronic equipment and computer-readable storage medium
CN110309514A (en) A kind of method for recognizing semantics and device
CN112149421A (en) Software programming field entity identification method based on BERT embedding
CN112115721A (en) Named entity identification method and device
CN116151256A (en) Small sample named entity recognition method based on multitasking and prompt learning
CN112800190A (en) Intent recognition and slot value filling joint prediction method based on Bert model
CN114492441A (en) BilSTM-BiDAF named entity identification method based on machine reading understanding
CN113971394A (en) Text repeat rewriting system
CN115587594A (en) Network security unstructured text data extraction model training method and system
CN116432184A (en) Malicious software detection method based on semantic analysis and bidirectional coding characterization
CN111914553A (en) Financial information negative subject judgment method based on machine learning
CN114492460A (en) Event causal relationship extraction method based on derivative prompt learning
Jiang et al. A hierarchical model with recurrent convolutional neural networks for sequential sentence classification
Zhang et al. Description-Enhanced Label Embedding Contrastive Learning for Text Classification
CN112015760B (en) Automatic question-answering method and device based on candidate answer set reordering and storage medium
Lahbari et al. A rule-based method for Arabic question classification
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN115510869A (en) End-to-end Tibetan La lattice shallow semantic analysis method
Bouaine et al. Word Embedding for High Performance Cross-Language Plagiarism Detection Techniques.
CN114722818A (en) Named entity recognition model based on anti-migration learning
Kashihara et al. Automated corpus annotation for cybersecurity named entity recognition with small keyword dictionary
CN114239584A (en) Named entity identification method based on self-supervision learning
CN112230990A (en) Program code duplication checking method based on hierarchical attention neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant