CN115510869A

CN115510869A - An End-to-End Tibetan Lag Shallow Semantic Analysis Method

Info

Publication number: CN115510869A
Application number: CN202210602138.3A
Authority: CN
Inventors: 班玛宝; 才让加; 张瑞; 慈祯嘉措; 桑杰端珠; 杨毛加
Original assignee: Qinghai Normal University
Current assignee: Qinghai Normal University
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-12-23
Anticipated expiration: 2042-05-30
Also published as: CN115510869B

Abstract

The invention relates to the technical field of Tibetan La lattice shallow semantic analysis, in particular to an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps: 1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector; 2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the timing sequence characteristics and the context semantic information of the input sentence are learned by adopting the BilSTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate unobstructed between different layers; 3. calculating local normalized distribution of the semantic label at each moment by using softmax, so as to be used for constraint decoding by an output layer; 4. and when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the set BIO and La lattice shallow semantic labeling constraints. The method can better perform the Tibetan La grid shallow semantic analysis.

Description

An End-to-End Tibetan Lag Shallow Semantic Analysis Method

技术领域technical field

本发明涉及藏文La格浅层语义分析技术领域，具体地说，涉及一种端到端的藏文La格浅层语义分析方法。The present invention relates to the technical field of Tibetan Lag shallow semantic analysis, in particular to an end-to-end Tibetan Lag shallow semantic analysis method.

背景技术Background technique

藏文La格浅层语义分析的目标是首先找出给定句子的谓词，然后以该谓词为核心，确定相关的主要语义成分，并标记相应的语义标签，进而得到给定句子的类型及各语义成分所充当的语义角色。The goal of Tibetan Lag shallow semantic analysis is to first find out the predicate of a given sentence, and then use the predicate as the core to determine the relevant main semantic components and mark the corresponding semantic labels, and then obtain the type of the given sentence and its various meanings. Semantic roles played by semantic components.

La格是藏文语法典籍《三十颂》和《音势论》中的重点和难点，也是八格

中的主要研究内容，所以面向自然语言处理，研究基于机器学习方法的藏文La格浅层语义分析技术可以对很多上层藏语自然语言理解任务起到帮助作用，比如语义角色标注、语义分析、信息抽取、自动问答、阅读理解、机器翻译等。此外，La格是藏语文课本中必学的一个重点知识，唯有熟练掌握其概念和用法，才能准确区分藏文La的类型，找准每个句子的主要语义成分，并进一步分析每个句子的实际含义。可见研究基于机器学习方法的藏文La格浅层语义分析技术在La格的学习中也具备一定的实际应用价值。Lage is the key and difficult point in the Tibetan grammar classics "Thirty Songs" and "Sound Potential", and it is also an eight-form

Therefore, for natural language processing, the study of Tibetan Lag shallow semantic analysis technology based on machine learning methods can help many upper-level Tibetan natural language understanding tasks, such as semantic role labeling, semantic analysis, Information extraction, automatic question answering, reading comprehension, machine translation, etc. In addition, Lage is a key knowledge that must be learned in Tibetan textbooks. Only by mastering its concept and usage can we accurately distinguish the types of Tibetan La, identify the main semantic components of each sentence, and further analyze each sentence actual meaning. It can be seen that the study of Tibetan Lague shallow semantic analysis technology based on machine learning methods also has certain practical application value in Lague learning.

传统的浅层语义分析任务与句法分析密切相关，严重依赖于句法分析结果，这导致增加了浅层语义分析的复杂性。近几年，随着深度学习技术的不断成熟，没有句法输入的端到端的模型在浅层语义分析任务上取得了很好的结果。有文献研究了端到端的LSTM浅层语义分析方法，取得了优于传统引入句法信息方法的效果，这些端到端模型的成功揭示了LSTM处理句子潜在句法结构的潜在能力，也为进一步研究和改进提供了理论基础和参考依据。目前，尚未查阅到有关基于端到端模型的藏文浅层语义分析方法，更未见到有关La格浅层语义分析的文献报道。Traditional shallow semantic analysis tasks are closely related to syntactic analysis and rely heavily on syntactic analysis results, which leads to increased complexity of shallow semantic analysis. In recent years, with the continuous maturity of deep learning technology, end-to-end models without syntactic input have achieved good results on shallow semantic analysis tasks. Some literatures have studied the end-to-end LSTM shallow semantic analysis method, which has achieved better results than the traditional method of introducing syntactic information. The success of these end-to-end models reveals the potential ability of LSTM to deal with the latent syntactic structure of sentences. The improvement provides a theoretical basis and a reference basis. At present, there is no reference to the Tibetan shallow semantic analysis method based on the end-to-end model, let alone the literature report on the Lag shallow semantic analysis.

发明内容Contents of the invention

本发明的内容是提供一种端到端的藏文La格浅层语义分析方法，其能够克服现有技术的某种或某些缺陷。The content of the present invention is to provide an end-to-end Tibetan Laga shallow semantic analysis method, which can overcome some or some defects of the prior art.

根据本发明的一种端到端的藏文La格浅层语义分析方法，其包括以下步骤：A kind of end-to-end Tibetan LaGa shallow semantic analysis method according to the present invention, it comprises the following steps:

一、将输入以词为单元的特征序列和对应的标记序列映射成低维实值向量；1. Map the input feature sequence with word as the unit and the corresponding tag sequence into a low-dimensional real-valued vector;

二、在LSTM的垂直方向上装置门控高速连接机制GM，采用BiLSTM学习输入句子的时序特征和上下文语义信息；GM包含对单元内部输入和输出的线性连接，使信息可以通畅地在不同层之间传播；2. Install the gated high-speed connection mechanism GM in the vertical direction of LSTM, and use BiLSTM to learn the timing characteristics and contextual semantic information of the input sentence; GM includes linear connections to the input and output of the unit, so that information can be smoothly transferred between different layers Spread between;

三、使用softmax计算每一时刻语义标签的局部归一化分布，以供输出层进行约束解码；3. Use softmax to calculate the local normalized distribution of the semantic label at each moment for the output layer to perform constraint decoding;

四、使用维特比算法进行解码时通过强制执行设定的BIO和La格浅层语义标注约束，规范输出语义标签之间的结构关系。4. By enforcing the set BIO and La lattice shallow semantic labeling constraints when using the Viterbi algorithm for decoding, standardize the structural relationship between the output semantic tags.

作为优选，步骤一中，用

表示已训练好的GloVe词向量，用V表示词汇表，

用C∈{0,1}表示标记集合，则最原始的输入序列{w₁,w₂,…,w_T}和标记序列{m₁,m₂,…,m_T}通过查找表lookup table映射成低维实值向量e(w_t)和e(m_t)，其中w_t∈V和对应标记m_t∈C；至此，可将向量e(w_t)和e(m_t)拼接成x_l,t作为LSTM第一层的输入：As a preference, in step one, use

Represents the trained GloVe word vector, and V represents the vocabulary,

Use C ∈ {0,1} to represent the tag set, then the most original input sequence {w ₁ ,w ₂ ,…,w _T } and the tag sequence {m ₁ ,m ₂ ,…,m _T } pass the lookup table lookup table Mapped into low-dimensional real-valued vectors e(w _t ) and e(m _t ), where w _t ∈ V and the corresponding label m _t ∈ C; so far, the vectors e(w _t ) and e(m _t ) can be spliced into x _{l, t is} used as the input of the first layer of LSTM:

x_l,t＝[e(w_t),e(m_t)]x _l,t ＝[e(w _t ),e(m _t )]

其中，x_l,t是第l层t时刻到LSTM的输入，此处l＝1，t＝[1,T]。Among them, x _l,t is the input to the LSTM at the moment t of the first layer, where l=1, t=[1,T].

作为优选，步骤二中，使用第一个LSTM正向处理输入的句子，然后以这层的输出作为下一层的输入进行反向处理，为提高时序特征的学习能力和充分获取各个时刻的上下文语义信息奠定基础；LSTM的定义如下：Preferably, in step 2, use the first LSTM to process the input sentence forward, and then use the output of this layer as the input of the next layer for reverse processing, in order to improve the learning ability of time series features and fully obtain the context of each moment Semantic information lays the foundation; LSTM is defined as follows:

其中，δ_l代表第l层LSTM的方向，当δ_l＝-1时LSTM方向为正向，当δ_l＝1时方向为反向；Among them, δ _l represents the direction of the l-th layer LSTM. When δ _l = -1, the direction of LSTM is forward, and when δ _l = 1, the direction is reverse;

为了以交织模式堆叠LSTM，按以下方式排列特定层的输入x_l,t和方向参数δ_l：To stack LSTMs in interleaved mode, layer-specific inputs xl _,t and orientation parameters _δl are arranged in the following way:

输入向量x_l,t是字符w_t的单词嵌入和表示w_t的单词是否为给定谓词的二元特征(t＝v)的嵌入的拼接。The input vector xl _,t is the concatenation of the word embedding of the character w _t and the embedding representing whether the word of w _t is a binary feature (t=v) for a given predicate.

作为优选，步骤二中，通过在LSTM的垂直方向上装置GM来控制层之间线性和非线性变换权重，其作用为平衡垂直方向上信息的传递；用λ_l,t表示GM的门控装置，则使用GM后隐藏层的输出h_l,t可更改为：As a preference, in step 2, the linear and nonlinear transformation weights between the layers are controlled by installing GM in the vertical direction of the LSTM, which functions to balance the transmission of information in the vertical direction; use λ _{l, t} to represent the gating device of GM , then the output h _l,t of the hidden layer after using GM can be changed to:

h_l′_,t＝LSTM(h_l-1,t,h_l,t-1)。h _l ′ _,t = LSTM(h _l-1,t ,h _l,t-1 ).

作为优选，步骤二中，为了减少过度拟合，使用辍学率Dropout，通过共享Dropout掩码D_l应用于隐藏状态：Preferably, in step 2, in order to reduce overfitting, the dropout rate Dropout is used, and the shared Dropout mask D _l is applied to the hidden state:

输入La格句子的特征序列x＝{w₁,w₂,…,w_n}，相应的正确语义标签序列y＝{y₁,y₂,…,y_n}的对数似然为：Input the feature sequence x={w ₁ ,w ₂ ,…,w _n } of a La-case sentence, and the log likelihood of the corresponding correct semantic label sequence y={y ₁ ,y ₂ ,…,y _n } is:

作为优选，步骤三中，根据模型的隐藏状态h_l,t，使用softmax可计算输出语义标签y_t上的局部归一化分布：Preferably, in step 3, according to the hidden state h _l,t of the model, softmax can be used to calculate the local normalized distribution on the output semantic label y _t :

上式中的W_o是softmax的参数矩阵，

是Kronecker delta，维度与语义标签的个数一致；模型训练目标是在给定输入的时候最大化正确标签的概率。W _o in the above formula is the parameter matrix of softmax,

Is the Kronecker delta, the dimension is consistent with the number of semantic labels; the model training goal is to maximize the probability of the correct label when given the input.

本发明首先借鉴LSTM的设计思路，通过在LSTM的垂直方向上装置门控高速连接机制(GM)来平衡垂直方向上信息的传递。通过GM，信息可以在空间和时间维度上更通畅地传播，且只存在较小的信息损失。最重要的是GM包含一个“门控”函数，可以动态地选择或者忽略信息在垂直方向上的传播，这样不同层次的抽象表示就可以更方便地被传递到输出层。然后使用softmax计算每一时刻语义标签的局部归一化分布，以供输出层进行约束解码。最后使用维特比算法进行解码时通过强制执行本文设定的BIO和La格浅层语义标注约束，规范了输出语义标签之间的结构关系，进而提高了最终预测语义标签的准确性。The present invention first learns from the design idea of LSTM, and balances the transmission of information in the vertical direction by installing a gated high-speed connection mechanism (GM) in the vertical direction of the LSTM. Through GM, information can spread more smoothly in space and time dimensions, and there is only a small loss of information. The most important thing is that GM contains a "gating" function that can dynamically select or ignore the propagation of information in the vertical direction, so that different levels of abstract representation can be more conveniently passed to the output layer. Then softmax is used to calculate the local normalized distribution of semantic labels at each moment for constrained decoding at the output layer. Finally, when using the Viterbi algorithm for decoding, by enforcing the BIO and La lattice shallow semantic labeling constraints set in this paper, the structural relationship between the output semantic tags is standardized, and the accuracy of the final predicted semantic tags is improved.

附图说明Description of drawings

图1为实施例中La格的浅层语义分析模型架构示意图；Fig. 1 is the shallow layer semantic analysis model architecture schematic diagram of La lattice in the embodiment;

图2为实施例中GM与LSTM的差异图；Fig. 2 is the difference diagram of GM and LSTM in the embodiment;

图3为实施例中GM对实验性能的影响示意图；Fig. 3 is the schematic diagram of the influence of GM on experimental performance in the embodiment;

图4为实施例中约束解码对实验性能的影响示意图；FIG. 4 is a schematic diagram of the influence of constraint decoding on experimental performance in an embodiment;

图5为实施例中时序特征学习方式对实验性能的影响示意图。Fig. 5 is a schematic diagram of the influence of the time series feature learning method on the experimental performance in the embodiment.

具体实施方式detailed description

为进一步了解本发明的内容，结合附图和实施例对本发明作详细描述。应当理解的是，实施例仅仅是对本发明进行解释而并非限定。In order to further understand the content of the present invention, the present invention will be described in detail in conjunction with the accompanying drawings and embodiments. It should be understood that the examples are only for explaining the present invention and not for limiting it.

实施例Example

本实施例提供了一种端到端的藏文La格浅层语义分析方法，其包括以下步骤：The present embodiment provides a kind of end-to-end Tibetan Lag shallow semantic analysis method, which comprises the following steps:

由于语义角色标注(semantic role labeling，SRL)作为一种浅层语义表示形式，具有简单易用、多语言适用、模型和算法研究比较深入等优点，且藏文La格的浅层语义分析的目标与语义角色标注任务的目标类似，故通过参考和借鉴语义角色标注任务中的代表性研究，即基于基于深层Bi-LSTM和Self-attention框架的工作，设计了如图1所示的模型架构，主要由以下几部分组成：Since semantic role labeling (semantic role labeling, SRL), as a shallow semantic representation, has the advantages of being simple and easy to use, applicable to multiple languages, and more in-depth research on models and algorithms, and the goal of shallow semantic analysis of Tibetan La grid Similar to the goal of the semantic role labeling task, the model architecture shown in Figure 1 is designed by referring to and drawing on the representative research in the semantic role labeling task, that is, based on the work based on the deep Bi-LSTM and Self-attention framework. It mainly consists of the following parts:

(1)嵌入层：将输入以词为单元的特征序列和对应的标记序列映射成低维实值向量；(1) Embedding layer: map the input feature sequence in units of words and the corresponding tag sequence into a low-dimensional real-valued vector;

(2)LSTM层：以提高模型时序特征的学习能力和增强模型的语义空间表达能力为目的，本实施例采用BiLSTM学习输入句子的时序特征和上下文语义信息；(2) LSTM layer: for the purpose of improving the learning ability of the model's timing features and enhancing the semantic space expression ability of the model, this embodiment uses BiLSTM to learn the timing features and contextual semantic information of the input sentence;

(3)门控高速链接：为了缓解训练BiLSTM时梯度消失的问题，本实施例使用GM，控制层之间线性和非线性变换的权重；(3) Gated high-speed link: In order to alleviate the problem of gradient disappearance when training BiLSTM, this embodiment uses GM to control the weight of linear and nonlinear transformation between layers;

(4)Softmax层：使用Softmax函数，将各时刻可能输出的语义标签映射成(0,1)的局部归一化分布，以供输出层进行约束解码；(4) Softmax layer: use the Softmax function to map the semantic labels that may be output at each moment into a local normalized distribution of (0,1) for the output layer to perform constraint decoding;

(5)约束解码层：为了在解码时对输出语义标签之间的结构关系进行约束，本实施例使用维特比算法进行解码时强制执行本实施例设定的BIO和La格浅层语义标注约束。(5) Constrained decoding layer: In order to constrain the structural relationship between the output semantic tags during decoding, this embodiment uses the Viterbi algorithm to enforce the BIO and La grid shallow semantic labeling constraints set in this embodiment when decoding .

嵌入层embedding layer

在藏文La格的浅层语义分析任务中，用

表示已训练好的GloVe词向量，用V表示词汇表，

用

表示标记集合，则最原始的输入序列{w₁,w₂,…,w_T}和标记序列{m₁,m₂,…,m_T}通过查找表lookup table映射成低维实值向量e(w_t)和e(m_t)，其中w_t∈V和对应标记m_t∈C；至此，可将向量e(w_t)和e(m_t)拼接成x_l,t作为LSTM第一层的输入：In the shallow semantic analysis task of Tibetan Laga, using

Represents the trained GloVe word vector, and V represents the vocabulary,

use

Represents a tag set, then the most original input sequence {w ₁ ,w ₂ ,…,w _T } and tag sequence {m ₁ ,m ₂ ,…,m _T } are mapped into a low-dimensional real-valued vector e through a lookup table (w _t ) and e(m _t ), where w _t ∈ V and the corresponding label m _t ∈ C; so far, the vectors e(w _t ) and e(m _t ) can be spliced into x _l,t as the first LSTM Layer input:

x_l,t＝[e(w_t),e(m_t)]x _l,t ＝[e(w _t ),e(m _t )]

双向LSTMBidirectional LSTM

本实施例将藏文La格的浅层语义分析转化成了端到端的序列标注任务。序列标注任务对时序特征的学习能力和文本上下文信息的依赖度较高，所以使用第一个LSTM正向处理输入的句子，然后以这层的输出作为下一层的输入进行反向处理，为提高时序特征的学习能力和充分获取各个时刻的上下文语义信息奠定基础；LSTM的定义如下：This embodiment transforms the shallow semantic analysis of Tibetan Lag into an end-to-end sequence labeling task. The sequence labeling task is highly dependent on the learning ability of temporal features and text context information, so the first LSTM is used to process the input sentence forward, and then the output of this layer is used as the input of the next layer for reverse processing. Improving the learning ability of time series features and laying the foundation for fully obtaining contextual semantic information at each moment; LSTM is defined as follows:

基于LSTM的GMLSTM-based GM

为了缓解训练BiLSTM时梯度消失的问题，通过在LSTM的垂直方向上装置GM来控制层之间线性和非线性变换权重，其作用为平衡垂直方向上信息的传递；用λ_l,t表示GM的门控装置，则使用GM后隐藏层的输出h_l,t可更改为：In order to alleviate the problem of gradient disappearance when training BiLSTM, GM is installed in the vertical direction of LSTM to control the linear and nonlinear transformation weights between layers, which acts to balance the transmission of information in the vertical direction; λ _l,t represents the GM gating device, the output h _l,t of the hidden layer after using GM can be changed to:

h′_l,t＝LSTM(h_l-1,t,h_l,t-1)。h' _l,t = LSTM(h _l-1,t ,h _l,t-1 ).

为了清楚起见，给出了基于LSTM的GM和普通LSTM的差异图，详见图2。For clarity, the difference diagram of LSTM-based GM and plain LSTM is given, see Fig. 2 for details.

图2中，h_l-1,t表示上一层的输出，也是当前层的输入。h′_l,t表示候选输出，也是LSTM的输出。GM对h_l-1,t和h′_l,t进行了线性连接，这为信息在垂直方向上的高速传递起到了巨大的作用，λ_l,t决定了下一层有多少信息可以被传递到上一层。在训练过程中，λ_l,t越趋近于1，意味着传递到下一层的信息越多，当λ_l,t＝1时，输入将不经过任何变化直接拷贝给输出，经过GM机制，底层的信息可以更加通畅地传递到顶层。反之，λ_l,t越趋近于0，意味着传递到下一层的信息越少，当λ_l,t＝0时，GM就退化为传统的LSTM。因GM发生在神经元内部，故不影响底层信息在时间方向上的传递。In Figure 2, h _l-1,t represents the output of the previous layer and is also the input of the current layer. h′ _l,t represents the candidate output, which is also the output of LSTM. GM linearly connects h _l-1,t and h′ _l,t , which plays a huge role in the high-speed transmission of information in the vertical direction. λ _l,t determines how much information can be transmitted in the next layer to the upper level. During the training process, the closer λ _l,t is to 1, the more information is passed to the next layer. When λ _l,t = 1, the input will be directly copied to the output without any changes, and the GM mechanism , the information at the bottom layer can be transmitted to the top layer more smoothly. Conversely, the closer λ _l,t is to 0, the less information is passed to the next layer. When λ _l,t = 0, GM degenerates into a traditional LSTM. Since GM occurs inside neurons, it does not affect the transmission of underlying information in the time direction.

为了减少过度拟合，使用辍学率Dropout，通过共享Dropout掩码D_l应用于隐藏状态：To reduce overfitting, dropout is used, applied to the hidden state by a shared dropout mask _Dl :

Softmax层Softmax layer

根据模型的隐藏状态h_l,t，使用softmax可计算输出语义标签y_t上的局部归一化分布：According to the hidden state h _l,t of the model, the local normalized distribution on the output semantic label y _t can be calculated using softmax:

上式中的W_o是softmax的参数矩阵，

约束解码层constrained decoding layer

为了在解码时合并对输出结构的约束，本实施例根据BIO序列标注方法和La格的浅层语义标记规范，设定了BIO和La格浅层语义标注约束。最后使用维特比算法进行解码时强制执行上述两个约束，约束示例如下：In order to combine constraints on the output structure during decoding, this embodiment sets BIO and La lattice shallow semantic annotation constraints according to the BIO sequence annotation method and the La lattice shallow semantic annotation specification. Finally, the above two constraints are enforced when decoding using the Viterbi algorithm. An example of the constraints is as follows:

(1)BIO约束(1) BIO constraints

BIO是NLP中常用的序列标注方法，B表示标注片段的开头，I表示标注片段的中间或结尾，O表示其它。其约束拒绝任何不会产生有效BIO转换的序列，例如B-A0紧接着I-A1等。BIO is a commonly used sequence labeling method in NLP. B indicates the beginning of the labeled segment, I indicates the middle or end of the labeled segment, and O indicates other. Its constraints reject any sequence that does not produce a valid BIO transition, such as B-A0 followed by I-A1, etc.

(2)La格浅层语义标注约束(2) Lag-based shallow semantic annotation constraints

唯一语义标签：语义标签A0、A1、A2和表1中的专有标签对于每个La格句型最多只能出现一次；Unique semantic label: Semantic labels A0, A1, A2 and the proprietary labels in Table 1 can only appear once at most for each La-case sentence pattern;

受限语义标签：拒绝任何表1中的专有标签交叉出现在不同句型中，例如AM-Bas紧接着AM-L₂等。Restricted Semantic Tags: Reject any proprietary tags in Table 1 that cross-appear in different sentence patterns, such as AM-Bas followed by AM-L ₂ , etc.

顺序语义标签：拒绝任何表1中的专有标签AM-L_i序列出现在在另一个专有标签之前的情况，例如AM-L₁紧接着AM-Bas等。Sequential semantic labeling: reject any case where a proprietary label AM-L _i sequence in Table 1 appears before another proprietary label, such as AM-L ₁ followed by AM-Bas, etc.

延续语义标签：延续语义标签仅在其基本语义标签在其之前实现时才可以存在，例如B-A0紧接着I-A0等。Continuation semantic tag: A continuation semantic tag can only exist if its basic semantic tag is implemented before it, such as B-A0 followed by I-A0, etc.

表1专有语义标签和共有语义标签的界定表Table 1 Defining table of proprietary semantic tags and shared semantic tags

因La格中的虚词

在业格、为格、依格、同格和时格五种用法中，原则上可以随机代替其余虚词

和

故统称它们为La格

La格包括自由虚词和不自由虚词两种，其中

为不自由虚词，添接受限于前一音节的后加字，

和

为自由虚词，添接不受前一音节的限制。Function Words in Inlague

In the five usages of ye case, wei case, yi case, tong case and time case, the rest of the function words can be randomly replaced in principle

and

So collectively they are called Lager

La case includes two kinds of free function words and non-free function words, among them

As a non-free function word, Tim accepts the suffix word limited to the previous syllable,

and

As a free function word, the addition is not restricted by the previous syllable.

根据La格虚词的语义功能、添接规则及不同用法，可以将La格虚词的用法分为表业格、为格、依格、同格和时格的五类句型：According to the semantic functions, joining rules and different usages of La-case function words, the usage of La-case function words can be divided into five types of sentences: Biaoye case, Wei case, Yi case, Tong case and Shi case:

业格句：表示在某一实施处所已经实施或正在实施地动作及将要实施动作的句子^[9]。业格句的一个特征是句中有实施处所

实施地动作

和La格虚词

三个主要语义成分，三者缺一不可。如“

(在图书馆学习)”中，实施动作的处所是“

(书店)”，实施地动作是“

(学习)”和La格虚词是

机器若能正确识别并区分这些语义成分，则基本可以正确理解其语义。Dative sentence: A sentence that expresses the action that has been implemented or is being implemented and the action that will be implemented in a certain place of implementation ^[9] . One of the characteristics of karma sentences is that there is an implementation place in the sentence

implement the action

and Lag function words

There are three main semantic components, all of which are indispensable. Such as"

(Learning in the library)", the place to implement the action is "

(bookstore)", the implementation action is "

(Learning)" and La-case function words are

If the machine can correctly identify and distinguish these semantic components, it can basically understand its semantics correctly.

为格句：表示为达到某一目的而实施动作行为的句子。为格句的一个特征是句中有目的

为达到某一目的而实施地动作行为

和La格虚词

三个主要语义成分，三者缺一不可。如“

(为了获得知识而努力)”中，目的是“

(知识)”，为达到目的而实施地动作行为是“

(努力)”，La格虚词是

机器若能正确识别并区分这些语义成分，则基本可以正确理解其语义。For the clause: a sentence that expresses the implementation of an action to achieve a certain purpose. A characteristic of a case sentence is that there is a purpose in the sentence

an action performed to achieve a purpose

and Lag function words

(Efforts to acquire knowledge)", the purpose is "

(knowledge)", the action behavior implemented to achieve the goal is "

(Efforts)", the Lag function word is

依格句：表示某物依存于添接La格虚词的某一地点的句子。依格例句的主要特征有二，一是句中有添接La格虚词的某一地点

某依存物

和La格虚词

三种主要语义成分。如“

(教室里学生多)”中，地点是“

(教室)”，依存物是“

(学生)”，La格虚词是

机器若能正确识别并区分这些语义成分，则基本可以正确理解其语义；二是当该类句型中的谓词为存在助词

和

等时，可以没有第一个特征中的依存物这一成分，如

中，地点是“

(教室)，存在助词是“

(有)”，La格虚词是

机器若能正确识别并区分这些语义成分，也基本可以正确理解其语义。Accordive sentence: A sentence that expresses that something depends on a certain place where Lag function words are added. There are two main features in the example sentences according to the case. One is that there is a certain place in the sentence where La case function words are added

a dependency

and Lag function words

Three main semantic components. Such as"

(There are many students in the classroom)", the location is "

(Classroom)", the dependency is "

(student)", the Lag function word is

If the machine can correctly identify and distinguish these semantic components, it can basically understand its semantics correctly;

and

etc., there can be no dependency component in the first feature, such as

, where the location is "

(classroom), the existential particle is "

(Yes)", the Lag function word is

同格句：表示某一事物改变成另一种事物，使两者在动作行为变化的结果上具有“同一体”性质的句子。同格例句的特征主要有二，一是句中有某一事物

改变成的另一种事物

和La格虚词

三种主要语义成分。如“

(汉文翻译成藏文)”中，某一事物是“

(汉文)”，变成的另一种事物是“

(藏文)”，La格虚词是

机器若能正确识别并区分这些语义成分，则基本可以正确理解其语义；二是句中有动作行为变化的结果

动作行为

和La格虚词

三种主要语义成分。如“

(让他开心)”中，动作行为变化的结果是“

(藏文)”，动作行为是“

(让)”，La格虚词是

机器若能正确识别并区分这些语义成分，也基本可以正确理解其语义。Congruent sentence: A sentence that expresses the change of one thing into another, so that the two have the property of "identity" in the result of the change of action and behavior. There are two main characteristics of the example sentence in the same case. One is that there is a certain thing in the sentence

changed into something else

and Lag function words

Three main semantic components. Such as"

(Translated from Chinese to Tibetan)", a certain thing is "

(Chinese)", another thing that becomes is "

(Tibetan)", Lag function words are

If the machine can correctly identify and distinguish these semantic components, it can basically understand its semantics correctly; the second is the result of changes in actions and behaviors in sentences

action behavior

and Lag function words

Three main semantic components. Such as"

(Make him happy)", the result of the action behavior change is "

(Tibetan)", the action behavior is "

(Let)", the Lague function word is

时格句：表示实施某一动作行为时间的句子。该类句型的主要特征有二，一是句中有实施某一动作行为的时间

实施地某一动作行为

和La格虚词

三种主要语义成分，如“

(研究了三年)”中，实施动作行为的时间是“

(三年)”，实施的动作行为是“

(研究)”，La格虚词是

机器若能正确识别并区分这些语义成分，则基本可以正确理解其语义；二是句中只有实施某一动作行为的时间

和La格虚词

两种语义成分，没有实施地某一动作行为

这一语义成分的现象较常见，如“

(他三年)”中，实施动作行为的时间是“

(三年)”，La格虚词是

机器若能正确识别并区分这些语义成分，也基本可以正确理解其语义。Temporal clauses: Sentences expressing the time when an action is performed. There are two main characteristics of this type of sentence. One is that there is a time for performing a certain action in the sentence.

perform an action

and Lag function words

Three main semantic components, such as "

(Three years of research)", the time to implement the action behavior is "

(three years)", the action behavior implemented is "

(research)", the Lag function word is

If the machine can correctly identify and distinguish these semantic components, it can basically understand its semantics correctly; the second is that there is only time to implement a certain action in a sentence

and Lag function words

Two semantic components, a certain action behavior without implementation

The phenomenon of this semantic component is more common, such as "

(His three years)", the time to implement the action behavior is "

(three years)", the Lag function word is

实验experiment

由于目前没有公开的藏文La格浅层语义分析数据集，故本实施例首先从实验室课题组构建的藏语文本生语料库中抽取了2万条仅含一个La格虚词的句子。然后通过对这些句子进行预处理后精选出了12000句用于标记La格浅层语义。最后根据本实施例制定的La格浅层语义标记规范，采用人工标注的方式完成了藏文La格浅层语义分析数据集的构建，为了便于后期使用，简称其为TLSD。实验时，将TLSD数据集按8:1:1的比例划分成了训练集、验证集和测试集。Since there is no publicly available Tibetan Lag case shallow semantic analysis data set, this example first extracted 20,000 sentences containing only one Lag case function word from the Tibetan corpus constructed by the laboratory research group. Then, 12,000 sentences were selected after preprocessing these sentences to mark the Lag shallow semantics. Finally, according to the La grid shallow semantic label specification formulated in this embodiment, the construction of the Tibetan La grid shallow semantic analysis data set is completed by manual labeling. In order to facilitate later use, it is referred to as TLSD. During the experiment, the TLSD data set was divided into training set, verification set and test set according to the ratio of 8:1:1.

在实验过程中，为了确保实验结果的可对比性，对所有模型的超参数进行了调参范围限定，经过多次调参，最终在有限的范围内选择了当前最优的超参数组合，模型参数详见表2。During the experiment, in order to ensure the comparability of the experimental results, the hyperparameters of all models were adjusted to a limited range. After several times of parameter adjustments, the current optimal hyperparameter combination was finally selected within a limited range. The model The parameters are detailed in Table 2.

表2模型参数Table 2 Model parameters

基线方法baseline method

由于目前尚未查阅到有关藏文La格浅层语义分析的文献，加上没有公开的藏文La格浅层语义分析数据集，所以无法直接通过与前人的工作进行对比来验证本实施例模型的效果。Since there is no reference to the literature on Tibetan Lag shallow semantic analysis, and there is no public Tibetan Lag shallow semantic analysis data set, it is impossible to directly verify the model of this embodiment by comparing it with previous work. Effect.

基于上述原因，我们将选择几个在搭建本实施例模型时所参考文献中的方法作为基线模型，验证本实施例模型的效果。Based on the above reasons, we will select several methods in the referenced literature when building the model of this embodiment as the baseline model to verify the effect of the model of this embodiment.

(1)LSTM+CRF：是一种基于深度双向RNN的语义角色标注方法。(1) LSTM+CRF: It is a semantic role labeling method based on deep bidirectional RNN.

(2)DBLSTM：是一种基于深度双向LSTM的语义角色标注方法。(2) DBLSTM: It is a semantic role labeling method based on deep bidirectional LSTM.

(3)Self-Attention：是一种基于自注意力机制的语义角色标注方法。(3) Self-Attention: It is a semantic role labeling method based on the self-attention mechanism.

(4)End-to-end：是一种基于跨度的端到端的语义角色标注方法。(4) End-to-end: It is a span-based end-to-end semantic role labeling method.

(5)BiLSTM+GM+CD和BiLSTM+GM+CD(V)为本实施例模型，分别指句子中的谓词需要模型进行预测和提前给定时的藏文La格浅层语义分析模型。(5) BiLSTM+GM+CD and BiLSTM+GM+CD(V) are the models of this embodiment, which respectively refer to the shallow semantic analysis model of Tibetan La grid when the predicates in the sentence need to be predicted by the model and given in advance.

评价指标Evaluation index

本实施例选用序列标注任务中常用的评价指标准确率(ACC)对模型性能进行了评价。令TP表示预测为正的正样本，FP表示预测为正的负样本，FN表示预测为负的正样本，TN表示预测为负的负样本，则准确率(ACC)的计算为：In this embodiment, the accuracy rate (ACC), an evaluation indicator commonly used in sequence labeling tasks, is selected to evaluate the performance of the model. Let TP represent positive samples predicted to be positive, FP to represent negative samples predicted to be positive, FN to represent positive samples predicted to be negative, and TN to represent negative samples predicted to be negative, then the accuracy rate (ACC) is calculated as:

实验结果与分析Experimental results and analysis

模型在TLSD数据集上的性能对比Performance comparison of models on TLSD dataset

为了验证本实施例模型的有效性和优越性，对比了基线模型与本实施例模型的藏文La格浅层语义分析性能，各模型的实验对比结果见表3。In order to verify the validity and superiority of the model in this embodiment, the Tibetan La grid shallow semantic analysis performance of the baseline model and the model in this embodiment is compared. The experimental comparison results of each model are shown in Table 3.

表3各模型的实验对比结果Table 3 Experimental comparison results of each model

从表3中的实验结果可以看出，本实施例模型相比几种基线模型，性能均有所提升。模型自行预测谓词时，在测试集上比LSTM+CRF、DBLSTM、Self-Attention和End-to-end的La格浅层语义分析准确率分别提高了3.33、3.01、1.58和1.87个百分点，表明本实施例模型的藏文La格浅层语义分析性能更佳。另外，可以看出，本实施例模型联合预测谓词及对应的其它语义标签时也取得了很好的效果，在测试集上，La格浅层语义分析准确率只比提前给定谓词时低0.93％，验证了本实施例模型的优越性。It can be seen from the experimental results in Table 3 that the performance of the model in this example is improved compared with several baseline models. When the model predicts the predicate by itself, the accuracy rate of La lattice shallow semantic analysis on the test set is increased by 3.33, 3.01, 1.58 and 1.87 percentage points, respectively, compared with LSTM+CRF, DBLSTM, Self-Attention and End-to-end. The Tibetan Lag shallow semantic analysis performance of the embodiment model is better. In addition, it can be seen that the model of this embodiment has also achieved good results when jointly predicting predicates and other corresponding semantic labels. On the test set, the accuracy of La lattice shallow semantic analysis is only 0.93 lower than that when predicates are given in advance. %, verified the superiority of the model of this embodiment.

藏文La格浅层语义分析任务在几种基线模型和本实施例模型上均取得了不错的效果，主要原因有三，一是TLSD数据集中的所有句子为仅包含一个La格虚词的藏文单句；二是TLSD数据集中的句子都不长，长度在4到30个词之间；三是相较语义角色标注任务，La格的浅层语义标签易于识别和标记。另外，本实施例模型相对基线模型性能更好的原因主要有二，一是通过装置GM平衡了信息在垂直方向上的传递，进而缓解了梯度消失的问；二是在解码时通过约束输出结构，得到了更加合理的输出语义标签结构。The Tibetan L-case shallow semantic analysis task has achieved good results on several baseline models and the model of this example. There are three main reasons. First, all the sentences in the TLSD dataset are Tibetan single sentences containing only one La-case function word The second is that the sentences in the TLSD dataset are not long, between 4 and 30 words in length; the third is that compared with the semantic role labeling task, Lag's shallow semantic labels are easy to identify and label. In addition, there are two main reasons why the performance of the model in this embodiment is better than that of the baseline model. One is that the device GM balances the transmission of information in the vertical direction, thereby alleviating the problem of gradient disappearance; the other is that the output structure is constrained during decoding. , and a more reasonable output semantic label structure is obtained.

GM的有效性验证GM Validation Verification

为了验证GM的有效性，分别考查了使用GM的BiLSTM时和使用普通BiLSTM时模型的La格浅层语义分析效果。方式一为使用普通BiLSTM的方式，方式二为使用GM的BiLSTM方式，在测试集上的实验结果见图3。In order to verify the effectiveness of GM, the La lattice shallow semantic analysis effect of the model was examined when using GM's BiLSTM and using ordinary BiLSTM. The first method is to use the ordinary BiLSTM method, and the second method is to use the GM BiLSTM method. The experimental results on the test set are shown in Figure 3.

从图3中可以看出，使用GM的BiLSTM时的准确率比使用普通BiLSTM时高1.06％，验证了GM的有效性。It can be seen from Figure 3 that the accuracy rate when using GM's BiLSTM is 1.06% higher than when using ordinary BiLSTM, which verifies the effectiveness of GM.

约束解码方法的有效性验证Validation of Constrained Decoding Method

为了验证约束解码方法的有效性，分别考查了使用和不使用约束解码时模型的La格浅层语义分析效果。方式一为不使用约束解码的方法，方式二为使用约束解码的方法，在测试集上的实验结果见图4。In order to verify the effectiveness of the constrained decoding method, the effect of La lattice shallow semantic analysis of the model is examined with and without constrained decoding. Method 1 is a method without constrained decoding, and method 2 is a method using constrained decoding. The experimental results on the test set are shown in Figure 4.

从图4中可以看出，使用约束解码时模型的La格浅层语义分析准确率比不使用时高0.76％，验证了约束解码方法的有效性。It can be seen from Figure 4 that the La-lattice shallow semantic analysis accuracy of the model is 0.76% higher when constrained decoding is used than when it is not used, which verifies the effectiveness of the constrained decoding method.

时序特征学习方式对模型性能影响Influence of time series feature learning method on model performance

为了考查时序特征学习方式对模型性能的影响，分别对比了使用LSTM和使用BiLSTM时模型的La格浅层语义分析效果。方式一为使用LSTM学习时序特征的方法，方式二为使用BiLSTM学习时序特征的方法，在测试集上的实验结果见图5。In order to examine the impact of the time series feature learning method on the performance of the model, the La lattice shallow semantic analysis effect of the model was compared when using LSTM and BiLSTM. Method 1 is the method of using LSTM to learn time-series features, and method 2 is the method of using BiLSTM to learn time-series features. The experimental results on the test set are shown in Figure 5.

从图5中可以看出，使用BiLSTM学习时序特征时模型的La格浅层语义分析准确率比使用LSTM学习时序特征时高2.57％，表明使用BiLSTM学习时序特征时模型的性能更佳。It can be seen from Figure 5 that the La lattice shallow semantic analysis accuracy rate of the model when using BiLSTM to learn temporal features is 2.57% higher than that of using LSTM to learn temporal features, indicating that the performance of the model is better when using BiLSTM to learn temporal features.

结语epilogue

本实施例提出了一种端到端的长短记忆神经网络藏文La格浅层语义分析方法，为了缓解梯度消失问题和平衡垂直方向上信息的传递，本实施例通过在LSTM的垂直方向上装置GM来混合线性和非线性信息，使信息在空间和时间维度上更通畅地传播。为了规范输出语义标签之间的结构关系和提高预测语义标签的准确性，使用维特比算法进行解码时强制执行了本实施例设定的BIO和La格浅层语义标注约束。实验结果显示，在测试集上，本实施例方法的La格浅层语义分析准确率达到了90.59％，性能均优于几种基线模型。This embodiment proposes an end-to-end long-short memory neural network Tibetan La grid shallow semantic analysis method. In order to alleviate the problem of gradient disappearance and balance the transmission of information in the vertical direction, this embodiment installs GM in the vertical direction of LSTM To mix linear and nonlinear information, so that information spreads more smoothly in space and time dimensions. In order to standardize the structural relationship between the output semantic tags and improve the accuracy of predicted semantic tags, the BIO and La grid shallow semantic tag constraints set in this embodiment are enforced when decoding using the Viterbi algorithm. Experimental results show that, on the test set, the accuracy rate of La lattice shallow semantic analysis of the method in this embodiment reaches 90.59%, and the performance is better than several baseline models.

以上示意性的对本发明及其实施方式进行了描述，该描述没有限制性，附图中所示的也只是本发明的实施方式之一，实际的结构并不局限于此。所以，如果本领域的普通技术人员受其启示，在不脱离本发明创造宗旨的情况下，不经创造性的设计出与该技术方案相似的结构方式及实施例，均应属于本发明的保护范围。The above schematically describes the present invention and its implementation, which is not restrictive, and what is shown in the drawings is only one of the implementations of the present invention, and the actual structure is not limited thereto. Therefore, if a person of ordinary skill in the art is inspired by it, without departing from the inventive concept of the present invention, without creatively designing a structural mode and embodiment similar to the technical solution, it shall all belong to the protection scope of the present invention .

Claims

1. an end-to-end Tibetan Lag shallow semantic analysis method, is characterized in that: comprise the following steps:

1. Map the input feature sequence with word as the unit and the corresponding tag sequence into a low-dimensional real-valued vector;

2. Install the gated high-speed connection mechanism GM in the vertical direction of LSTM, and use BiLSTM to learn the timing characteristics and contextual semantic information of the input sentence; GM includes linear connections to the input and output of the unit, so that information can be smoothly transferred between different layers Spread between;

3. Use softmax to calculate the local normalized distribution of the semantic label at each moment for the output layer to perform constraint decoding;

4. By enforcing the set BIO and La lattice shallow semantic labeling constraints when using the Viterbi algorithm for decoding, standardize the structural relationship between the output semantic tags.

2. according to a kind of end-to-end Tibetan La grid shallow semantic analysis method described in claim 1, it is characterized in that: in step 1, use

Represents the trained GloVe word vector, and V represents the vocabulary,

Use C∈{0,1} to represent the tag set, then the most original input sequence {w ₁ ,w ₂ ,…,w _T } and the tag sequence {m ₁ ,m ₂ ,…,m _T } are mapped through the lookup table into low-dimensional real-valued vectors e(w _t ) and e(m _t ), where w _t ∈ V and the corresponding marker m _t ∈ C; so far, the vectors e(w _t ) and e(m _t ) can be concatenated into x _{l, t} as the input of the first layer of LSTM:

x _l,t ＝[e(w _t ),e(m _t )]

Among them, x _l,t is the input to the LSTM at the moment t of the first layer, where l=1, t=[1,T].

3. according to a kind of end-to-end Tibetan La grid shallow semantic analysis method described in claim 1, it is characterized in that: in step 2, use the sentence of first LSTM forward processing input, then with this layer The output is reversely processed as the input of the next layer, laying the foundation for improving the learning ability of time series features and fully obtaining contextual semantic information at each moment; the definition of LSTM is as follows:

Among them, δ _l represents the direction of the l-th layer LSTM. When δ _l = -1, the direction of LSTM is forward, and when δ _l = 1, the direction is reverse;

To stack LSTMs in interleaved mode, layer-specific inputs xl _,t and orientation parameters _δl are arranged in the following way:

The input vector xl _,t is the concatenation of the word embedding of the character w _t and the embedding representing whether the word of w _t is a binary feature (t=v) for a given predicate.

4. according to a kind of end-to-end Tibetan La lattice shallow layer semantic analysis method described in claim 3, it is characterized in that: in step 2, by installing GM on the vertical direction of LSTM to control the linearity and non-linearity between layers Linear transformation weight, its function is to balance the transmission of information in the vertical direction; use λ _l,t to represent the gating device of GM, then the output h _l,t of the hidden layer after using GM can be changed to:

h' _l,t = LSTM(h _l-1,t ,h _l,t-1 ).

5. according to a kind of end-to-end Tibetan La grid shallow semantic analysis method described in claim 4, it is characterized in that: in step 2, in order to reduce overfitting, use dropout rate Dropout, by sharing Dropout mask D _l applied to the hidden state:

Input the feature sequence x={w ₁ ,w ₂ ,…,w _n } of a La-case sentence, and the log likelihood of the corresponding correct semantic label sequence y={y ₁ ,y ₂ ,…,y _n } is:

6. According to a kind of end-to-end Tibetan La grid shallow semantic analysis method described in claim 1, it is characterized in that: in step 3, according to the hidden state h _l,t of the model, use softmax to calculate the output semantic label Locally normalized distribution over y _t :

W _o in the above formula is the parameter matrix of softmax,