WO2021179570A1 - 序列标注方法、装置、计算机设备和存储介质 - Google Patents

序列标注方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021179570A1
WO2021179570A1 PCT/CN2020/117162 CN2020117162W WO2021179570A1 WO 2021179570 A1 WO2021179570 A1 WO 2021179570A1 CN 2020117162 W CN2020117162 W CN 2020117162W WO 2021179570 A1 WO2021179570 A1 WO 2021179570A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
text
labeled
matrix
vector
Prior art date
Application number
PCT/CN2020/117162
Other languages
English (en)
French (fr)
Inventor
陈桢博
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021179570A1 publication Critical patent/WO2021179570A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a sequence labeling method, device, computer equipment and storage medium.
  • the sequence labeling task in Natural Language Processing refers to the sequence labeling of text words, including tasks such as named entity recognition, part-of-speech labeling, and knowledge entity extraction, which are usually implemented by supervised learning algorithms.
  • the existing technology implementation algorithms include traditional machine learning algorithms (CRF, etc.) and deep learning algorithms (Bi-LSTM), among which deep learning algorithms have better accuracy.
  • the current cutting-edge deep learning algorithms will add an additional attention mechanism to this type of task to achieve the feature information extraction of the weights of the sequence units.
  • the inventor realizes that the operation of the attention mechanism is based on matrix operations, which will cause a higher amount of calculation in the model training, which leads to a higher time consumption. Therefore, it is necessary to improve the existing technology in order to obtain a better user experience.
  • a sequence labeling method includes:
  • a sequence labeling device includes a sequence labeling model, and the sequence labeling model includes:
  • Embedding layer used to obtain the text to be labeled and convert the text to be labeled into a vector form, where the vector form includes the character, word vector, and position vector of each character;
  • Convolutional layer used to extract the feature information of the output vector of the embedding layer, and calculate the attention weight matrix between each word in the text to be labeled according to the feature information, so as to compare each word in the text to be labeled Attention weight mapping is performed on the relationship between;
  • CRF layer used to add the feature matrix of the fully connected layer and the attention weight matrix output by the convolutional layer to calculate the probability that each word in the text to be labeled belongs to each label;
  • Output layer used to output the highest probability of each word belonging to each tag in the to-be-labeled text output in the CRF layer as the tag sequence prediction result.
  • a computer device includes a memory and a processor, and computer-readable instructions are stored in the memory.
  • the processor executes the following steps:
  • a storage medium storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the above sequence labeling method and device calculate the word vector and position vector of the text through the embedding layer, and extract the local feature vector of the text word, word vector and position vector through the convolution layer. Then use the attention mechanism of the EM algorithm (Exception Maximization Algorithm, expectation maximization algorithm) to calculate the correlation weight between each word in the text, and finally calculate the probability that each word in the text belongs to each label according to the weight relationship.
  • the label sequence with the highest probability of the labeling probability to which the word belongs is labelled as the sequence label of the output text of the prediction result.
  • This application uses the convolutional neural network of the deep learning algorithm and draws on the attention mechanism of the EM algorithm in the CV (Computer Vision) field. The operation of the attention mechanism of the EM algorithm reduces the amount of calculation of long text in the NLP sequence labeling task. , Improve the efficiency of sequence labeling, and ensure the accuracy of sequence labeling tasks.
  • Fig. 1 is an implementation environment diagram of a sequence labeling method provided in an embodiment
  • Figure 2 is a block diagram of the internal structure of a computer device in an embodiment
  • Figure 3 is a flowchart of a sequence labeling method in an embodiment
  • Figure 4 is a structural block diagram of a sequence labeling device in an embodiment
  • Figure 5 is a structural block diagram of a convolutional layer in an embodiment
  • Fig. 6 is a structural block diagram of the CRF layer in an embodiment.
  • FIG. 1 is an implementation environment diagram of a sequence labeling method provided in an embodiment. As shown in FIG. 1, the implementation environment includes a computer device 110 and a terminal 120.
  • the computer device 110 is a sequence label processing device, for example, a computer device such as a computer used by a tester, and a sequence label processing tool is installed on the computer device 110.
  • the terminal 120 is installed with an application that requires sequence labeling processing.
  • the tester can issue a sequence labeling processing request at the terminal 120.
  • the sequence labeling processing request carries a sequence labeling processing identifier, and the computer device 110 receives it.
  • the sequence annotation processing request obtains the test script corresponding to the sequence annotation processing identifier in the computer device 110 according to the sequence annotation processing identifier, and then uses the sequence annotation processing tool to execute the test script, test the application on the terminal 120, and obtain the test script Corresponding sequence annotation processing result.
  • the terminal 120 and the computer device 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a server, etc., but are not limited thereto.
  • the computer device 110 and the terminal 120 may be connected via Bluetooth, USB (Universal Serial Bus, Universal Serial Bus) or other communication connection methods, which is not limited in this application.
  • Figure 2 is a schematic diagram of the internal structure of a computer device in an embodiment.
  • the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus.
  • the non-volatile storage medium of the computer device stores an operating system, a database, and a computer program.
  • the database may store control information sequences.
  • the processor can implement a sequence labeling process. method.
  • the processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment.
  • a computer readable instruction may be stored in the memory of the computer device, and when the computer readable instruction is executed by the processor, the processor may execute a sequence labeling processing method.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a method for sequence labeling of text that needs to be processed in natural language is proposed.
  • the method can be applied to the above-mentioned computer device 110, and specifically may include the following steps S302 to S308:
  • Step S302 Obtain the text to be labeled, and determine the word, word vector, and position vector of the text to be labeled;
  • obtaining the text to be marked is generally completed by a computer device.
  • a background server is used to complete.
  • the use of a background server here is not limited to the server, as other computers mentioned above.
  • the equipment can also be assumed.
  • the back-end server is responsible for the sequence labeling operation work, and the sequence labeling detector is set on the back-end server. After the sequence labeling detector receives the sequence labeling detection request, the sequence labeling detector The text to be marked will be obtained and saved in the memory.
  • the text to be marked can also be saved in a non-volatile storage medium for processing.
  • the text information is converted into a vector form, including characters, word vectors, and position vectors.
  • word vector dictionary text characters of length m can be mapped to vectors of length n one by one, thereby constructing an m*n matrix.
  • the text input is ['Ping',' ⁇ ']
  • the two characters'Ping'and' ⁇ ' can be mapped to a 300-dimensional vector in turn, thereby constructing a 2*300-dimensional matrix.
  • word vectors are generally realized by the classic algorithm Word2Vec, which is an unsupervised learning algorithm, which encodes the sentences in the training corpus into one-hot (one-hot encoding, also known as one-bit effective encoding) form, and passes c-bow Method (prediction of middle words by context) or skip-gram method (prediction of context by middle words), constructed as middle word one-hot encoding, middle word feature encoding, and front and back word one-hot encoding . Since the one-hot encoding is known, the character vector or word vector of the word or word is obtained by training the intermediate feature encoding.
  • the position vector refers to the method proposed by Google.
  • Step S304 extracting characteristic information of the character, word vector, and position vector
  • the feature information of the character, word vector, and position vector is extracted.
  • a one-dimensional convolutional layer is first constructed to reduce the feature dimension; and then a multi-layer one-dimensional convolutional layer is constructed to extract local feature information.
  • the input vector dimension is m*n
  • the dimension of the 1-dimensional convolution kernel is preset to 3*n
  • the number of channels is c
  • the output of the convolutional layer is a matrix with dimensions m*c.
  • Step S306 Calculate the attention weight matrix between the words in the text to be labeled according to the feature information, so as to perform attention weight mapping on the relationship between the words in the text to be labeled;
  • the convolutional layer constructs a self-attention mechanism to calculate the attention weight matrix between each word in the text to be labeled, and is used to map the attention weight between each word in the text to Quantify the interaction between the various words in the text.
  • This embodiment draws on the EM algorithm to perform unsupervised calculation of attention weights, where the EM algorithm includes:
  • Step E calculates the probability distribution of the matrix with the dimension m*c output by the convolutional layer, including calculating the attention weight of m*k, where k ⁇ m, using k cores, each character a and the core Corresponding hidden variables Estimation of the attention weight is completed through the current parameters.
  • Kernal is the kernel function
  • x is the vector representation of each character a
  • represents the distribution parameters under each core;
  • Step M re-estimates the algorithm parameters according to the probability distribution output by Step E, and the parameters are determined by the formula The calculation is completed, where n is the character length of the text to be labeled, and t is the iteration round of the EM step, which functions to obtain a weighted average of the hidden variable Z ak ; this is an unsupervised process.
  • Step E and Step M converge through multiple iterations to complete the calculation of the attention weight matrix, so as to calculate the correlation weight between each word in the text to be labeled.
  • Step S308 Add the feature matrix of the fully connected layer to the attention weight matrix to calculate the probability that each word in the text to be labeled belongs to each tag, and predict the tag sequence with the highest probability of each tag of each word The result is output.
  • the feature matrix of the fully connected layer is obtained by inputting the feature information of the characters, word vectors, and position vectors of the text to be labeled into the fully connected layer for calculation.
  • the fully connected layer is a convolutional neural network.
  • the highest probability of each label belonging to each character is output as the label sequence prediction result
  • the Z with the highest probability of each label belonging to each word is output as the prediction result of the labeling sequence.
  • the meaning of sequence labeling in this technical solution is to label each word in each text to be labeled with corresponding label attributes when processing words and word decomposition in natural language processing technology, and the output result is a label sequence, or labeling sequence.
  • Fig. 4 shows a sequence labeling device proposed in an embodiment.
  • the sequence labeling device can be integrated in the aforementioned computer device 110, and specifically includes an embedding layer 402, a convolutional layer 404, a CRF layer 406, and an output layer. 408. in,
  • the embedding layer 402 is used to obtain the text to be labeled and convert the text to be labeled into a vector form; wherein the vector form includes the character, word vector, and position vector of each character;
  • obtaining the text to be marked is generally completed by a computer device.
  • a background server is used to complete.
  • the use of a background server here is not limited to the server, as other computers mentioned above.
  • the equipment can also be assumed.
  • the back-end server is responsible for the sequence labeling operation work, and the sequence labeling detector is set on the back-end server. After the sequence labeling detector receives the sequence labeling detection request, the sequence labeling detector The text to be marked will be obtained and saved in the memory.
  • the text to be marked can also be saved in a non-volatile storage medium for processing.
  • the embedding layer 402 converts the to-be-labeled text into a vector form including the character, word vector, and position vector of each character.
  • text characters of length m can be mapped to vectors of length n one by one, thereby constructing an m*n matrix.
  • the words'Ping'and' ⁇ ' can be mapped to a 300-dimensional vector in turn, thereby constructing a 2*300-dimensional matrix.
  • word vectors are generally realized by the classic algorithm Word2Vec, which is an unsupervised learning algorithm, which encodes the sentences in the training corpus into one-hot (one-hot encoding, also known as one-bit effective encoding) form, and passes c-bow Method (prediction of middle words by context) or skip-gram method (prediction of context by middle words), constructed as middle word one-hot encoding, middle word feature encoding, and front and back word one-hot encoding . Since the one-hot encoding is known, the character vector or word vector of the word or word is obtained by training the intermediate feature encoding.
  • the position vector refers to the method proposed by Google.
  • the convolutional layer 404 is used to extract feature information of the output vector of the embedding layer, and calculate the attention weight matrix between each word in the text to be labeled according to the feature information, so as to compare each word in the text to be labeled. Attention weight mapping is performed on the relationship between;
  • a structural block diagram of a convolutional layer is provided, and the convolutional layer 404 further includes a feature information conversion unit 502 and an attention weight matrix calculation unit 504.
  • the feature information conversion unit 502 is used to extract feature information of the output vector of the embedding layer 402, which extracts the feature information of the character, word vector, and position vector.
  • a one-dimensional convolutional layer is first constructed to reduce the feature Dimension; build a multi-layer 1D convolutional layer to achieve local feature information extraction, where the input vector dimension is m*n, the dimension of the 1-dimensional convolution kernel is preset to 3*n, and the number of channels is c; In the first dimension direction, a sliding convolution with a step length of 1 is performed, and the final multi-layer convolution layer outputs a matrix with a dimension of m*c.
  • Set up multi-layer convolutional layers to refine feature information in order, and deeper layers can better fit the mathematical distribution.
  • the attention matrix calculation unit 504 is configured to calculate the attention weight matrix between each word in the text to be labeled according to the feature information of the vector, and the convolutional layer constructs a self-attention mechanism for comparing the text in the text. The relationship between each word is mapped with attention weight to quantify the mutual influence between each word in the text.
  • This embodiment draws on the EM algorithm to perform unsupervised calculation of attention weights, where the EM algorithm includes:
  • Step E calculates the probability distribution of the matrix with the dimension m*c output by the convolutional layer, including calculating the attention weight of m*k, where k ⁇ m, using k cores, each character a and the core Corresponding hidden variables Estimation of the attention weight is completed through the current parameters.
  • Kernal is the kernel function
  • x is the vector representation of each character a
  • represents the distribution parameters under each core;
  • Step M re-estimates the algorithm parameters according to the probability distribution output by Step E, and the parameters are determined by the formula The calculation is completed, where n is the character length of the text to be labeled, and t is the iteration round of the EM step, which functions to obtain a weighted average of the hidden variable Z ak ; this is an unsupervised process.
  • the E step and the M step converge through multiple iterations to complete the calculation of the attention weight matrix, so as to calculate the correlation weight between each word in the text to be labeled.
  • the CRF layer 406 is configured to add the feature matrix of the fully connected layer and the attention weight matrix output by the convolutional layer to calculate the probability that each word in the text to be labeled belongs to each label;
  • the CRF layer 406 further includes a fully connected layer matrix calculation unit 602 and a label probability calculation unit 604.
  • the fully connected layer matrix calculation unit 602 It is used to receive the feature information of the character, word vector, and position vector, and input it into the fully connected layer calculation to output the fully connected layer feature matrix; in this embodiment, the fully connected layer feature matrix is the The feature information of the characters, word vectors, and location vectors of, is input into the fully connected layer for calculation.
  • the fully connected layer is the fully connected layer of the convolutional neural network.
  • the input of the characteristic information into the fully connected layer for calculation is an existing technology, and the calculation process of the output fully connected layer matrix is not repeated.
  • the output layer 408 is configured to output the highest probability that each word in the to-be-annotated text output in the CRF layer belongs to each tag as a tag sequence prediction result.
  • the output layer 408 outputs the tag sequence with the highest probability of each tag belonging to each word as a tag sequence, that is, each word in the text to be labeled corresponds to the tag sequence of each tag with the highest probability Z, and The tag sequence is output as the prediction result.
  • a computer device in one embodiment, includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor, and the processor executes the computer The following steps are implemented during the program:
  • obtaining the text to be marked is generally completed by a computer device.
  • a background server is used to complete.
  • the use of a background server here is not limited to the server, as other computers mentioned above.
  • the equipment can also be assumed.
  • the back-end server is responsible for the sequence labeling operation work, and the sequence labeling detector is set on the back-end server. After the sequence labeling detector receives the sequence labeling detection request, the sequence labeling detector The text to be marked will be obtained and saved in the memory.
  • the text to be marked can also be saved in a non-volatile storage medium for processing.
  • the text information of the text to be labeled is converted into a vector form, including a character, a word vector, and a position vector.
  • word vector dictionary text characters of length m can be mapped to vectors of length n one by one, thereby constructing an m*n matrix.
  • the words'Ping'and' ⁇ ' can be mapped to a 300-dimensional vector in turn, thereby constructing a 2*300-dimensional matrix.
  • word vectors are generally realized by the classic algorithm Word2Vec, which is an unsupervised learning algorithm, which encodes the sentences in the training corpus into one-hot (one-hot encoding, also known as one-bit effective encoding) form, and passes c-bow Method (prediction of middle words by context) or skip-gram method (prediction of context by middle words), constructed as middle word one-hot encoding, middle word feature encoding, and front and back word one-hot encoding . Since the one-hot encoding is known, the character vector or word vector of the word or word is obtained by training the intermediate feature encoding.
  • the position vector refers to the method proposed by Google.
  • the feature information of the character, word vector, and position vector is extracted.
  • a one-dimensional convolutional layer is first constructed to reduce the feature dimension; and then a multi-layer one-dimensional convolutional layer is constructed to extract local feature information.
  • the input vector dimension is m*n
  • the dimension of the 1-dimensional convolution kernel is preset to 3*n
  • the number of channels is c
  • the output of the convolutional layer is a matrix with dimensions m*c.
  • the convolutional layer constructs a self-attention mechanism to calculate the attention weight matrix between each word in the text to be labeled, and is used to map the attention weight between each word in the text to Quantify the interaction between the various words in the text.
  • This embodiment draws on the EM algorithm to perform unsupervised calculation of attention weights, where the EM algorithm includes:
  • Step E calculates the probability distribution of the matrix with the dimension m*c output by the convolutional layer, including calculating the attention weight of m*k, where k ⁇ m, using k cores, each character a and the core Corresponding hidden variables Estimation of the weight is completed by the current parameters, where Kernal is the kernel function, x is the vector representation of each character a, and ⁇ represents the distribution parameters under each core;
  • Step M re-estimates the algorithm parameters according to the probability distribution output by Step E, and the parameters are determined by the formula The calculation is completed, where n is the character length of the text to be labeled, and t is the iteration round of the EM step , which is used to obtain a weighted average value of the hidden variable Z ak, which is an unsupervised process.
  • the E step and the M step converge through multiple iterations to complete the calculation of the attention weight matrix, so as to calculate the correlation weight between each word in the text to be labeled.
  • the feature matrix of the fully connected layer is obtained by inputting the feature information of the characters, word vectors, and position vectors of the text to be labeled into the fully connected layer for calculation.
  • the fully connected layer is a convolutional neural network.
  • the highest probability of each label belonging to each character is output as the label sequence prediction result
  • a storage medium storing computer-readable instructions.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the one or more processors perform the following steps:
  • obtaining the text to be marked is generally completed by a computer device.
  • a background server is used to complete.
  • the use of a background server here is not limited to the server, as other computers mentioned above.
  • the equipment can also be assumed.
  • the back-end server is responsible for the sequence labeling operation work, and the sequence labeling detector is set on the back-end server. After the sequence labeling detector receives the sequence labeling detection request, the sequence labeling detector The text to be marked will be obtained and saved in the memory.
  • the text to be marked can also be saved in a non-volatile storage medium for processing.
  • the text information of the text to be labeled is converted into a vector form, including a character, a word vector, and a position vector.
  • word vector dictionary text characters of length m can be mapped to vectors of length n one by one, thereby constructing an m*n matrix.
  • the words'Ping'and' ⁇ ' can be mapped to a 300-dimensional vector in turn, thereby constructing a 2*300-dimensional matrix.
  • word vectors are generally realized by the classic algorithm Word2Vec, which is an unsupervised learning algorithm, which encodes the sentences in the training corpus into one-hot (one-hot encoding, also known as one-bit effective encoding) form, and passes c-bow Method (prediction of middle words by context) or skip-gram method (prediction of context by middle words), constructed as middle word one-hot encoding, middle word feature encoding, and front and back word one-hot encoding . Since the one-hot encoding is known, the character vector or word vector of the word or word is obtained by training the intermediate feature encoding.
  • the position vector refers to the method proposed by Google.
  • the feature information of the character, word vector, and position vector is extracted.
  • a one-dimensional convolutional layer is first constructed to reduce the feature dimension; and then a multi-layer one-dimensional convolutional layer is constructed to extract local feature information.
  • the input vector dimension is m*n
  • the dimension of the 1-dimensional convolution kernel is preset to 3*n
  • the number of channels is c
  • the output of the convolutional layer is a matrix with dimensions m*c.
  • the convolutional layer constructs a self-attention mechanism to calculate the attention weight matrix between each word in the text to be labeled, and is used to map the attention weight between each word in the text to Quantify the interaction between the various words in the text.
  • This embodiment draws on the EM algorithm to perform unsupervised calculation of attention weights, where the EM algorithm includes:
  • Step E calculates the probability distribution of the matrix with the dimension m*c output by the convolutional layer, including calculating the attention weight of m*k, where k ⁇ m, using k cores, each character a and the core Corresponding hidden variables Estimation of the weight is completed by the current parameters, where Kernal is the kernel function, x is the vector representation of each character a, and ⁇ represents the distribution parameters under each core;
  • Step M re-estimates the algorithm parameters according to the probability distribution output by Step E, and the parameters are determined by the formula The calculation is completed, where n is the character length of the text to be labeled, and t is the iteration round of the EM step, which functions to obtain a weighted average of the hidden variable Z ak ; this is an unsupervised process.
  • the E step and the M step converge through multiple iterations to complete the calculation of the attention weight matrix, so as to calculate the correlation weight between each word in the text to be labeled.
  • the feature matrix of the fully connected layer is obtained by inputting the feature information of the characters, word vectors, and position vectors of the text to be labeled into the fully connected layer for calculation.
  • the fully connected layer is a convolutional neural network.
  • the highest probability of each label belonging to each character is output as the label sequence prediction result

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

一种序列标注方法、装置、计算机设备和存储介质。通过获取待标注文本;确定所述待标注文本的字、词向量和位置向量(S302);提取所述字、词向量和位置向量的特征信息(S304);根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射(S306);将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出(S308)。利用深度学习算法的卷积神经网络,借鉴了CV(Computer Vision,计算机视觉)领域EM算法的注意力机制,EM算法注意力机制的运算在NLP序列标注任务中降低了长文本的计算量,提高了序列标注的效率,并保证了序列标注任务的精度。

Description

序列标注方法、装置、计算机设备和存储介质
本申请要求于2020年03月13日提交中国专利局、申请号为202010174873.X,发明名称为“序列标注方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种序列标注方法、装置、计算机设备和存储介质。
背景技术
自然语言处理(NLP,Natural Language Processing)中的序列标注任务是指对于文本字词的序列标注,包括命名实体识别、词性标注、知识实体抽取等任务,通常通过监督学习算法实现。现有技术实现算法包括传统机器学习算法(CRF等)以及深度学习算法(Bi-LSTM)等,其中深度学习算法精度效果更佳。而目前前沿的深度学习算法,会在此类任务中额外加入注意力机制,以实现序列单元互相关联权重的特征信息提取。发明人意识到,注意力机制的运算是基于矩阵运算,会在模型训练中造成较高的计算量,从而导致较高的耗时。因此,有必要对现有技术进行改进以期获得更好的用户体验。
发明内容
基于此,有必要针对存在的问题,提供一种活体检测中视频图片的处理方法、装置和可读存储介质,以改善现有视频活体检测的效率。
一种序列标注方法,所述方法包括:
获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
提取所述字、词向量和位置向量的特征信息;
根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
一种序列标注装置,包括序列标注模型,所述序列标注模型包括:
嵌入层:用于获取待标注文本,并将所述待标注文本转化为向量形式,其中,向量形式包括各个字的字、词向量和位置向量;
卷积层:用于提取所述嵌入层输出向量的特征信息,并根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
CRF层:用于将全连接层特征矩阵与所述卷积层输出的注意力权重矩阵相加,计算出所述待标注文本中各个字属于各标签的概率;
输出层:用于将所述CRF层中输出的所述待标注文本中各个字属于各标签的概率最高者作为标签序列预测结果输出。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行实现如下步骤:
获取待标注文本确定所述待标注文本的字、词向量和位置向量;
提取所述字、词向量和位置向量的特征信息;
根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行实现如下步骤:
获取待标注文本确定所述待标注文本的字、词向量和位置向量;
提取所述字、词向量和位置向量的特征信息;
根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
与现有技术NLP技术中采用监督学习算法相比,上述序列标注方法、装置通过嵌入层计算文本的字向量和位置向量,通过卷积层提取文本字、词向量和位置向量的局部特征向量,然后利用EM算法(Exception Maximization Algorithm,期望最大化算法)的注意力机制计算出文本中各个字之间的关联权重,最后根据权重关系计算出文本中每个字属于各个标注的概率,将每个字所属的标注概率的最高概率的标签序列作为预测结果输出文本的序列标注。本申请利用深度学习算法的卷积神经网络,借鉴了CV(Computer Vision,计算机视觉)领域EM算法的注意力机制,EM算法注意力机制的运算在NLP序列标注任务中降低了长文本的计算量,提高了序列标注的效率,并保证了序列标注任务的精度。
附图说明
图1为一个实施例中提供的序列标注方法的实施环境图;
图2为一个实施例中计算机设备的内部结构框图;
图3为一个实施例中序列标注方法的流程图;
图4为一个实施例中序列标注装置的结构框图;
图5为一个实施例中卷积层的结构框图;
图6为一个实施例中CRF层的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中提供的序列标注方法的实施环境图,如图1所示,在该实施环境中,包括计算机设备110以及终端120。
计算机设备110为序列标注处理设备,例如为测试人员使用的电脑等计算机设备,计算机设备110上安装有序列标注处理工具。终端120上安装有需要进行序列标注处理的应用,当需要进行序列标注处理时,测试人员可以在终端120发出序列标注处理请求,该序列标注处理请求中携带有序列标注处理标识,计算机设备110接收该序列标注处理请求,根据序列标注处理标识获取计算机设备110中与序列标注处理标识对应的测试脚本,然后利用序列标注处理工具执行该测试脚本,对终端120上的应用进行测试,并获取测试脚本对应的序列标注处理结果。
需要说明的是,终端120以及计算机设备110可为智能手机、平板电脑、笔记本电脑、台式计算机、服务器等,但并不局限于此。计算机设备110以及终端120可以通过蓝牙、USB(Universal Serial Bus,通用串行总线)或者其他通讯连接方式进行连接,本申请在此不做限制。
图2为一个实施例中计算机设备的内部结构示意图。如图2所示,该计算机设备包括通过系统总线连接的处理器、非易失性存储介质、存储器和网络接口。其中,该计算机设备的非易失性存储介质存储有操作系统、数据库和计算机程序,数据库中可存储有控件信息序列,该计算机程序被处理器执行时,可使得处理器实现一种序列标注处理方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种序列标注处理方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解,图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
如图3所示,在一个实施例中,提出了一种对自然语言需要处理的文本进行序列标注方法,该方法可以应用于上述的计算机设备110中,具体可以包括以下步骤S302~S308:
步骤S302,获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
在本实施例中,获取待标注文本一般由计算机设备来完成,本实施例中采用后台服务器来完成,当然,此处采用后台服务器来完成并非限定于服务器来完成,如前所述的其他计算机设备也可以承担。在对自然语言处理文本的序列标注技术中,后台服务器承担着序列标注运算工作,将序列标注检测器设置在后台服务器端,在序列标注检测器接收到序列标注的检测请求后,序列标注检测器会获取到待标注的文本,并将待标注文本保存到内存中。
在一些实施例中,也可以将待标注文本保存到非易失性存储介质中进行处理。
在本实施例中,将文本信息转化为向量形式,包括了字、词向量和位置向量。根据字向量字典,能够将长度为m的文本字符,逐一映射为长度为n的向量,从而构建m*n的矩阵。例如,文本输入为[‘苹’,‘果’],则能够将字‘苹’和‘果’两个字依次映射为300 维向量,从而构建2*300维的矩阵。字向量的生成一般通过经典算法Word2Vec来实现,该算法属于非监督学习算法,将训练语料中的语句编码为one-hot(独热编码,又称一位有效编码)形式,并通过c-bow方法(通过前后文预测中间字词)或者skip-gram方法(通过中间字词预测前后文)的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码。由于one-hot编码为已知的,因此,通过训练中间特征编码从而获得字或词的字向量或词向量。位置向量则参照Google提出的方法,由于通过卷积神经网络对文本编码信息进行特征提取时会忽略词语顺序信息,加入位置向量以使模型可以利用词向量序列的顺序,位置向量PE采用公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)计算得出,其中,公式中的pos表示某个字的位置,i表示第i个维度,d则表示位置向量设定维度。
步骤S304,提取所述字、词向量和位置向量的特征信息;
在本实施例中,提取所述字、词向量和位置向量的特征信息具体来说,先构建一层1维卷积层降低特征维度;再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵。设置多层卷积层依序提炼特征信息,更深的层次能够更好拟合数学分布。
步骤S306,根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
在本实施例中,所述卷积层构建自注意力机制,计算待标注文本中各个字之间的注意力权重矩阵,用于对文本中各个字之间的关系进行注意力权重映射,以量化文本中各个字之间的相互影响。本实施例借鉴EM算法进行注意力权重的非监督运算,其中,EM算法包括:
E步对所述卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
Figure PCTCN2020117162-appb-000001
通过当前参数完成注意力权重的估计其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
M步根据E步输出的概率分布重新估算算法参数,参数由公式
Figure PCTCN2020117162-appb-000002
计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次,作用为对隐变量Z ak求取加权平均值;这是一种无监督过程。
E步与M步再经过多次迭代收敛,以完成注意力权重矩阵计算,从而达到计算出待标注文本中各个字字之间的关联权重。
步骤S308,将全连接层特征矩阵与所属注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
在本实施例中,全连接层特征矩阵是将所述待标注文本的字、词向量和位置向量的特征信息输入到全连接层进行计算得出,所述全连接层为卷积神经网络的全连接层,通过将待标注文本的字、词向量和位置向量的特征信息输入到该全连接层进行计算为现有技术,输出的全连接层矩阵计算过程不赘述。获得全连接层特征矩阵后,将该全连接层特征矩阵与所述注意力权重矩阵相加,计算出待标注文本中各个字属于各标签的概率,各个字属于各标签的概率P采用公式
Figure PCTCN2020117162-appb-000003
实现。然后将各个字所属各标签中的最高概率者作为标签序列预测结果输出,输出的所述待标注文本中各个字属于各标签的概率最高者的预测结果采用公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))实现,其中,t和s为特征函数,λ和μ是对应的权值。最后,将各个字所属各标签的概率最高者Z作为标注序列的预测结果输出。本技术方案中序列标注的含义是针对自然语言处理技术中处理字、词分解时,对每个待标注文本中的每个字进行对应标签属性的标注,输出的结果为标签序列,或者是标注序列。
如图4所示为一种实施例中提出的一种序列标注装置,该序列标注装置可以集成于上述的计算机设备110中,具体包括嵌入层402、卷积层404、CRF层406和输出层408。其中,
嵌入层402,用于获取待标注文本,并将所述待标注文本转化为向量形式;其中,向量形式包括各个字的字、词向量和位置向量;
在本实施例中,获取待标注文本一般由计算机设备来完成,本实施例中采用后台服务器来完成,当然,此处采用后台服务器来完成并非限定于服务器来完成,如前所述的其他计算机设备也可以承担。在对自然语言处理文本的序列标注技术中,后台服务器承担着序列标注运算工作,将序列标注检测器设置在后台服务器端,在序列标注检测器接收到序列标注的检测请求后,序列标注检测器会获取到待标注的文本,并将待标注文本保存到内存中。
在一些实施例中,也可以将待标注文本保存到非易失性存储介质中进行处理。所述嵌入层402将所述待标注文本转化为向量形式包括各个字的字、词向量和位置向量。根据字向量字典,能够将长度为m的文本字符,逐一映射为长度为n的向量,从而构建m*n的矩阵。例如,文本输入为[‘苹’,‘果’],则能够将字‘苹’和‘果’两个字依次映射为300维向量,从而构建2*300维的矩阵。字向量的生成一般通过经典算法Word2Vec来实现,该算法属于非监督学习算法,将训练语料中的语句编码为one-hot(独热编码,又称一位有效编码)形式,并通过c-bow方法(通过前后文预测中间字词)或者skip-gram方法(通 过中间字词预测前后文)的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码。由于one-hot编码为已知的,因此,通过训练中间特征编码从而获得字或词的字向量或词向量。位置向量则参照Google提出的方法,由于通过卷积神经网络对文本编码信息进行特征提取时会忽略词语顺序信息,加入位置向量以使模型可以利用词向量序列的顺序,位置向量PE采用公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)计算得出,其中,公式中的pos表示某个字的位置,i表示第i个维度,d则表示位置向量设定维度。
卷积层404,用于提取所述嵌入层输出向量的特征信息,根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
如图5所示,在一个实施例中,提供了卷积层的结构框图,该卷积层404还包括特征信息转化单元502与注意力权重矩阵计算单元504。所述特征信息转化单元502用于提取所述嵌入层402输出向量的特征信息,其提取所述字、词向量和位置向量的特征信息具体来说,先构建一层1维卷积层降低特征维度;再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵。设置多层卷积层依序提炼特征信息,更深的层次能够更好拟合数学分布。
所述注意力矩阵计算单元504用于根据所述向量的特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,所述卷积层构建自注意力机制,用于对文本中各个字之间的关系进行注意力权重映射,以量化文本中各个字之间的相互影响。本实施例借鉴EM算法进行注意力权重的非监督运算,其中,EM算法包括:
E步对所述卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
Figure PCTCN2020117162-appb-000004
通过当前参数完成注意力权重的估计其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
M步根据E步输出的概率分布重新估算算法参数,参数由公式
Figure PCTCN2020117162-appb-000005
计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次,作用为对隐变量Z ak求取加权平均值;这是一种无监督过程。
E步与M步再经过多次迭代收敛,以完成注意力权重矩阵计算,从而达到计算出待标注文本中各个字之间的关联权重。
CRF层406,用于将全连接层特征矩阵与所述卷积层输出的注意力权重矩阵相加,计算出所述待标注文本中各个字属于各标签的概率;
如图6所示,在一个实施例中,提供了CRF层的结构框图,所述CRF层406还包括全连接层矩阵计算单元602与标签概率计算单元604,所述全连接层矩阵计算单元602用于接收所述字、词向量和位置向量的特征信息,并输入到全连接层计算,以输出全连接层特征矩阵;在本实施例中,全连接层特征矩阵是将所述待标注文本的字、词向量和位置向量的特征信息输入到全连接层进行计算得出,所述全连接层为卷积神经网络的全连接层,通过将待标注文本的字、词向量和位置向量的特征信息输入到该全连接层进行计算为现有技术,输出的全连接层矩阵计算过程不赘述。所述标签概率计算单元604用于将所述全连接层特征矩阵与所述注意力权重矩阵相加,根据公式
Figure PCTCN2020117162-appb-000006
计算出所述待标注文本中各个字属于各标签的概率P;再根据公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))计算中各个字所述各标签中的概率最高者Z;其中,t和s为特征函数,λ和μ是对应的权值。
输出层408,用于将所述CRF层中输出的所述待标注文本中各个字属于各标签中的概率最高者作为标签序列预测结果输出。
在本实施例中,所述输出层408是将各个字所属各标签的概率最高者Z输出为一个标签序列,即待标注文本中各个字对应有各个标签概率最高者Z的标签序列,将该标签序列作为预测结果输出。
在一个实施例中,提出了一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
提取所述字、词向量和位置向量的特征信息;
根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
在本实施例中,获取待标注文本一般由计算机设备来完成,本实施例中采用后台服务器来完成,当然,此处采用后台服务器来完成并非限定于服务器来完成,如前所述的其他计算机设备也可以承担。在对自然语言处理文本的序列标注技术中,后台服务器承担着序列标注运算工作,将序列标注检测器设置在后台服务器端,在序列标注检测器接收到序列标注的检测请求后,序列标注检测器会获取到待标注的文本,并将待标注文本保存到内存中。
在一些实施例中,也可以将待标注文本保存到非易失性存储介质中进行处理。
在本实施例中,将待标注文本的文本信息转化为向量形式,包括了字、词向量和位置向量。根据字向量字典,能够将长度为m的文本字符,逐一映射为长度为n的向量,从而构建m*n的矩阵。例如,文本输入为[‘苹’,‘果’],则能够将字‘苹’和‘果’两个字依次映射为300维向量,从而构建2*300维的矩阵。字向量的生成一般通过经典算法Word2Vec来实现,该算法属于非监督学习算法,将训练语料中的语句编码为one-hot(独热编码,又称一位有效编码)形式,并通过c-bow方法(通过前后文预测中间字词)或者skip-gram方法(通过中间字词预测前后文)的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码。由于one-hot编码为已知的,因此,通过训练中间特征编码从而获得字或词的字向量或词向量。位置向量则参照Google提出的方法,由于通过卷积神经网络对文本编码信息进行特征提取时会忽略词语顺序信息,加入位置向量以使模型可以利用词向量序列的顺序,位置向量PE采用公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)计算得出,其中,公式中的pos表示某个字的位置,i表示第i个维度,d则表示位置向量设定维度。
在本实施例中,提取所述字、词向量和位置向量的特征信息具体来说,先构建一层1维卷积层降低特征维度;再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵。设置多层卷积层依序提炼特征信息,更深的层次能够更好拟合数学分布。
在本实施例中,所述卷积层构建自注意力机制,计算待标注文本中各个字之间的注意力权重矩阵,用于对文本中各个字之间的关系进行注意力权重映射,以量化文本中各个字之间的相互影响。本实施例借鉴EM算法进行注意力权重的非监督运算,其中,EM算法包括:
E步对所述卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
Figure PCTCN2020117162-appb-000007
通过当前参数完成权重的估计其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
M步根据E步输出的概率分布重新估算算法参数,参数由公式
Figure PCTCN2020117162-appb-000008
计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次,作用为对隐变量Z ak求取加权平均值,这是一个无监督过程。
E步与M步再经过多次迭代收敛,以完成注意力权重矩阵计算,从而达到计算出待标注文本中各个字之间的关联权重。
在本实施例中,全连接层特征矩阵是将所述待标注文本的字、词向量和位置向量的特征信息输入到全连接层进行计算得出,所述全连接层为卷积神经网络的全连接层,通过将待标注文本的字、词向量和位置向量的特征信息输入到该全连接层进行计算为现有技术,输出的全连接层矩阵计算过程不赘述。获得全连接层特征矩阵后,将该全连接层特征矩阵与所述注意力权重矩阵相加,计算出待标注文本中各个字属于各标签的概率,各个字属于各标签的概率P采用公式
Figure PCTCN2020117162-appb-000009
实现。然后将各个字所属各标签中的最高概率者作为标签序列预测结果输出,输出的所述待标注文本中各个字属于各标签的概率最高者的预测结果采用公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))实现,其中,t和s为特征函数,λ和μ是对应的权值。最后,将各个字所属各标签的概率最高者Z作为标注序列的预测结果输出。
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性的。该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
提取所述字、词向量和位置向量的特征信息;
根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
在本实施例中,获取待标注文本一般由计算机设备来完成,本实施例中采用后台服务器来完成,当然,此处采用后台服务器来完成并非限定于服务器来完成,如前所述的其他计算机设备也可以承担。在对自然语言处理文本的序列标注技术中,后台服务器承担着序列标注运算工作,将序列标注检测器设置在后台服务器端,在序列标注检测器接收到序列标注的检测请求后,序列标注检测器会获取到待标注的文本,并将待标注文本保存到内存中。
在一些实施例中,也可以将待标注文本保存到非易失性存储介质中进行处理。
在本实施例中,将待标注文本的文本信息转化为向量形式,包括了字、词向量和位置向量。根据字向量字典,能够将长度为m的文本字符,逐一映射为长度为n的向量,从而构建m*n的矩阵。例如,文本输入为[‘苹’,‘果’],则能够将字‘苹’和‘果’两个字依次映射为300维向量,从而构建2*300维的矩阵。字向量的生成一般通过经典算法Word2Vec来实现,该算法属于非监督学习算法,将训练语料中的语句编码为one-hot(独热编码,又称一位有效编码)形式,并通过c-bow方法(通过前后文预测中间字词)或者skip-gram方法(通过中间字词预测前后文)的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码。由于one-hot编码为已知的,因此,通过训练中间特征编码从而获得字或词的字向量或词向量。位置向量则参照Google提出的方法,由于通过卷积神经网络对文本编码信息进行特征提取时会忽略词语顺序信息,加入位置向量以使模型可以利用词向量序列的顺序,位置向量PE采用公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)计算得出,其中,公式中的pos表示某个字的位置,i表示第i个维度,d则表示位置向量设定维度。
在本实施例中,提取所述字、词向量和位置向量的特征信息具体来说,先构建一层1维卷积层降低特征维度;再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵。设置多层卷积层依序提炼特征信息,更深的层次能够更好拟合数学分布。
在本实施例中,所述卷积层构建自注意力机制,计算待标注文本中各个字之间的注意力权重矩阵,用于对文本中各个字之间的关系进行注意力权重映射,以量化文本中各个字之间的相互影响。本实施例借鉴EM算法进行注意力权重的非监督运算,其中,EM算法包括:
E步对所述卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
Figure PCTCN2020117162-appb-000010
通过当前参数完成权重的估计其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
M步根据E步输出的概率分布重新估算算法参数,参数由公式
Figure PCTCN2020117162-appb-000011
计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次,作用为对隐变量Z ak求取加权平均值;这是一种无监督过程。
E步与M步再经过多次迭代收敛,以完成注意力权重矩阵计算,从而达到计算出待标注文本中各个字之间的关联权重。
在本实施例中,全连接层特征矩阵是将所述待标注文本的字、词向量和位置向量的特征信息输入到全连接层进行计算得出,所述全连接层为卷积神经网络的全连接层,通过将待标注文本的字、词向量和位置向量的特征信息输入到该全连接层进行计算为现有技术,输出的全连接层矩阵计算过程不赘述。获得全连接层特征矩阵后,将该全连接层特征矩阵与所述注意力权重矩阵相加,计算出待标注文本中各个字属于各标签的概率,各个字属于各标签的概率P采用公式
Figure PCTCN2020117162-appb-000012
实现。然后将各个字所属各标签中的最高概率者作为标签序列预测结果输出,输出的所述待标注文本中各个字属于各标签的概率最高者的预测结果采用公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))实现,其中,t和s为特征函数,λ和μ是对应的权值。最后,将各个字所属各标签的概率最高者Z作为标注序列的预测结果输出。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种序列标注方法,其中,所述方法包括:
    获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
    提取所述字、词向量和位置向量的特征信息;
    根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
    将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
  2. 如权利要求1所述的序列标注方法,其中,所述字、词向量的生成采用将训练语料中的语句编码为one-hot形式,并通过c-bow方法或者skip-gram方法的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码;
    所述位置向量由公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)确认,其中,pos表示某个字的位置,i表示第i维度,d表示位置向量设定维度。
  3. 如权利要求1或2所述的序列标注方法,其中,所述提取所述字、词向量和位置向量的特征信息具体包括如下步骤:
    构建一层1维卷积层降低特征维度;
    再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;
    卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵,以得到所述待标注文本的字、词向量和位置向量的特征信息。
  4. 如权利要求3所述的序列标注方法,其中,所述根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射具体包括如下步骤:
    采用EM算法的E步对所述多层卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
    Figure PCTCN2020117162-appb-100001
    通过当前参数完成权重的估计,其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
    采用EM算法的M步根据E步输出的概率分布重新估算算法参数,参数由公式
    Figure PCTCN2020117162-appb-100002
    计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次;
    E步与M步再经过多次迭代收敛,以输出所述待标注文本中各个字之间的注意力权重矩阵,实现对所述待标注文本中各个字之间的关系进行注意力权重映射。
  5. 如权利要求4所述的序列标注方法,其中,所述将全连接层输出矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为预测结果序列输出具体包括如下步骤:
    将所述字、词向量和位置向量的特征信息输入到全连接层计算,以输出全连接层特征矩阵;
    将所述全连接层特征矩阵与所述注意力权重矩阵相加,根据公式
    Figure PCTCN2020117162-appb-100003
    计算出所述待标注文本中各个字属于各标签的概率P;
    根据公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))计算中各个字所述各标签中的概率最高者Z;其中,t和s为特征函数,λ和μ是对应的权值;
    将各个字所述各标签的概率最高者Z作为标注序列的预测结果输出。
  6. 一种序列标注装置,其中,包括序列标注模型,所述序列标注模型包括:
    嵌入层:用于获取待标注文本,并将所述待标注文本转化为向量形式,其中,向量形式包括各个字的字、词向量和位置向量;
    卷积层:用于提取所述嵌入层输出向量的特征信息,并根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
    CRF层:用于将全连接层特征矩阵与所述卷积层输出的注意力权重矩阵相加,计算出所述待标注文本中各个字属于各标签的概率;
    输出层:用于将所述CRF层中输出的所述待标注文本中各个字属于各标签的概率最高者作为标签序列预测结果输出。
  7. 如权利要求6所述的序列标注装置,其中,所述卷积层还包括特征信息转化单元与注意力权重矩阵计算单元;
    所述特征信息转化单元用于提取所述嵌入层输出向量的特征信息,包括:
    构建一层1维卷积层降低特征维度;
    再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;
    卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵,以得到所述待标注文本的字、词向量和位置向量的特征信息。
  8. 如权利要求6或7所述的序列标注装置,其中,所述卷积层具体用于:
    采用EM算法的E步对所述卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
    Figure PCTCN2020117162-appb-100004
    通过当前参数完成注意力权重的估计,其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
    采用EM算法的M步根据E步输出的概率分布重新估算算法参数,参数由公式
    Figure PCTCN2020117162-appb-100005
    计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次;
    E步与M步再经过多次迭代收敛,以输出所述待标注文本中各个字之间的注意力权重矩阵,实现对所述待标注文本中各个字之间的关系进行注意力权重映射。
  9. 如权利要求8所述的序列标注装置,其中,所述CRF层还包括全连接层矩阵计算单元与标签概率计算单元;
    所述全连接层矩阵计算单元用于接收所述字、词向量和位置向量的特征信息,并输入到全连接层计算,以输出全连接层特征矩阵;
    所述标签概率计算单元用于将所述全连接层特征矩阵与所述注意力权重矩阵相加,根据公式
    Figure PCTCN2020117162-appb-100006
    计算出所述待标注文本中各个字属于各标签的概率P;再根据公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))计算中各个字所述各标签中的概率最高者Z;其中,t和s为特征函数,λ和μ是对应的权值。
  10. 如权利要求6所述的序列标注装置,其中,所述字、词向量的生成采用将训练语料中的语句编码为one-hot形式,并通过c-bow方法或者skip-gram方法的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码;
    所述位置向量由公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)确认,其中,pos表示某个字的位置,i 表示第i维度,d表示位置向量设定维度。
  11. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行实现如下步骤:
    获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
    提取所述字、词向量和位置向量的特征信息;
    根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
    将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
  12. 如权利要求11所述的计算机设备,其中,所述字、词向量的生成采用将训练语料中的语句编码为one-hot形式,并通过c-bow方法或者skip-gram方法的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码;
    所述位置向量由公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)确认,其中,pos表示某个字的位置,i表示第i维度,d表示位置向量设定维度。
  13. 如权利要求11或12所述的计算机设备,其中,所述提取所述字、词向量和位置向量的特征信息具体包括如下步骤:
    构建一层1维卷积层降低特征维度;
    再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;
    卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵,以得到所述待标注文本的字、词向量和位置向量的特征信息。
  14. 如权利要求13所述的计算机设备,其中,所述根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射具体包括如下步骤:
    采用EM算法的E步对所述多层卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
    Figure PCTCN2020117162-appb-100007
    通过当前参数完成权重的估计,其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;
    采用EM算法的M步根据E步输出的概率分布重新估算算法参数,参数由公式
    Figure PCTCN2020117162-appb-100008
    计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次;
    E步与M步再经过多次迭代收敛,以输出所述待标注文本中各个字之间的注意力权重矩阵,实现对所述待标注文本中各个字之间的关系进行注意力权重映射。
  15. 如权利要求14所述的计算机设备,其中,所述将全连接层输出矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为预测结果序列输出具体包括如下步骤:
    将所述字、词向量和位置向量的特征信息输入到全连接层计算,以输出全连接层特征矩阵;
    将所述全连接层特征矩阵与所述注意力权重矩阵相加,根据公式
    Figure PCTCN2020117162-appb-100009
    计算出所述待标注文本中各个字属于各标签的概率P;
    根据公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))计算中各个字所述各标签中的概率最高者Z;其中,t和s为特征函数,λ和μ是对应的权值;
    将各个字所述各标签的概率最高者Z作为标注序列的预测结果输出。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时用于实现一种序列标注方法,所述方法包括以下步骤:
    获取待标注文本,确定所述待标注文本的字、词向量和位置向量;
    提取所述字、词向量和位置向量的特征信息;
    根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射;
    将全连接层特征矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为标签序列预测结果输出。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述字、词向量的生成采用将训练语料中的语句编码为one-hot形式,并通过c-bow方法或者skip-gram方法的形式,构建为中间字词one-hot编码、中间字词特征编码、前后文字词one-hot编码;
    所述位置向量由公式PE(pos,2i)=sin(pos/10000 2i/d)和PE(pos,2i+1)=cos(pos/10000 2i/d)确认,其中,pos表示某个字的位置,i表示第i维度,d表示位置向量设定维度。
  18. 如权利要求16或17所述的计算机可读存储介质,其中,所述提取所述字、词向量和位置向量的特征信息具体包括如下步骤:
    构建一层1维卷积层降低特征维度;
    再构建多层1维卷积层实现局部特征信息提取,其中,输入的向量维度为m*n,1维卷积核维度预设为3*n,通道数为c;
    卷积核沿第1维方向进行步长为1的滑动卷积,最终多层卷积层输出维度为m*c的矩阵,以得到所述待标注文本的字、词向量和位置向量的特征信息。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述根据所述特征信息计算所述待标注文本中各个字之间的注意力权重矩阵,以对所述待标注文本中各个字之间的关系进行注意力权重映射具体包括如下步骤:
    采用EM算法的E步对所述多层卷积层输出的维度为m*c的矩阵进行概率分布计算,包括计算m*k的注意力权重,其中,k<m,采用建立k个核心,各个字符a与核心的对应隐变量
    Figure PCTCN2020117162-appb-100010
    通过当前参数完成权重的估计,其中,Kernal为核函数,x为各个字符a的向量形式表征,θ表示各个核心下的分布参数;采用EM算法的M步根据E步输出的概率分布重新估算算法参数,参数由公式
    Figure PCTCN2020117162-appb-100011
    计算完成,其中,n为所述待标注文本的字符长度,t为EM步的迭代轮次;
    E步与M步再经过多次迭代收敛,以输出所述待标注文本中各个字之间的注意力权重矩阵,实现对所述待标注文本中各个字之间的关系进行注意力权重映射。
  20. 如权利要求19所述的计算机可读存储介质,其中,所述将全连接层输出矩阵与所述注意力权重矩阵相加,以计算所述待标注文本中各个字属于各标签的概率,并将各个字所属各标签中的概率最高者作为预测结果序列输出具体包括如下步骤:
    将所述字、词向量和位置向量的特征信息输入到全连接层计算,以输出全连接层特征矩阵;
    将所述全连接层特征矩阵与所述注意力权重矩阵相加,根据公式
    Figure PCTCN2020117162-appb-100012
    计算出所述待标注文本中各个字属于各标签的概率P;
    根据公式Z(X)=∑ yexp(∑ i,kλ kt k(y i-1,y i,x,i)+∑ i,lμ ls l(y i,x,i))计算中各个字所述各标签中的概率最高者Z;其中,t和s为特征函数,λ和μ是对应的权值;
    将各个字所述各标签的概率最高者Z作为标注序列的预测结果输出。
PCT/CN2020/117162 2020-03-13 2020-09-23 序列标注方法、装置、计算机设备和存储介质 WO2021179570A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010174873.XA CN111460807B (zh) 2020-03-13 2020-03-13 序列标注方法、装置、计算机设备和存储介质
CN202010174873.X 2020-03-13

Publications (1)

Publication Number Publication Date
WO2021179570A1 true WO2021179570A1 (zh) 2021-09-16

Family

ID=71680782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117162 WO2021179570A1 (zh) 2020-03-13 2020-09-23 序列标注方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN111460807B (zh)
WO (1) WO2021179570A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048288A (zh) * 2021-11-10 2022-02-15 北京明略软件系统有限公司 细粒度情感分析方法、系统、计算机设备和存储介质
CN114580424A (zh) * 2022-04-24 2022-06-03 之江实验室 一种用于法律文书的命名实体识别的标注方法和装置
CN114707467A (zh) * 2022-03-18 2022-07-05 浙江大学 一种基于自注意力机制的自动化拼音转汉字方法
CN114861601A (zh) * 2022-04-29 2022-08-05 桂林电子科技大学 基于旋转式编码的事件联合抽取方法及存储介质
CN114925197A (zh) * 2022-03-28 2022-08-19 中南大学 基于主题注意力的深度学习文本分类模型训练方法
CN116342964A (zh) * 2023-05-24 2023-06-27 杭州有朋网络技术有限公司 针对于电子商务平台的图片宣传的风控系统及其方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460807B (zh) * 2020-03-13 2024-03-12 平安科技(深圳)有限公司 序列标注方法、装置、计算机设备和存储介质
CN112069816A (zh) * 2020-09-14 2020-12-11 深圳市北科瑞声科技股份有限公司 中文标点符号添加方法和系统及设备
CN112507698B (zh) * 2020-12-07 2024-05-24 深圳市优必选科技股份有限公司 字向量生成方法、装置、终端设备及计算机可读存储介质
CN112597825A (zh) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 驾驶场景分割方法、装置、电子设备和存储介质
CN112507719A (zh) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 命名实体识别方法、装置、电子设备及存储介质
CN112651242B (zh) * 2021-01-20 2024-04-26 重庆大学 一种基于内外注意力机制和可变尺度卷积的文本分类方法
CN113051897B (zh) * 2021-05-25 2021-09-10 中国电子科技集团公司第三十研究所 一种基于Performer结构的GPT2文本自动生成方法
CN113571052A (zh) * 2021-07-22 2021-10-29 湖北亿咖通科技有限公司 一种噪声提取及指令识别方法和电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222337A (zh) * 2019-05-28 2019-09-10 浙江邦盛科技有限公司 一种基于transformer和CRF的中文地址分词方法
CN110223742A (zh) * 2019-06-14 2019-09-10 中南大学 中文电子病历数据的临床表现信息抽取方法和设备
CN110287326A (zh) * 2019-07-03 2019-09-27 上海冰鉴信息科技有限公司 一种带背景描述的企业情感分析方法
CN110442840A (zh) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 序列标注网络更新方法、电子病历处理方法及相关装置
CN110827816A (zh) * 2019-11-08 2020-02-21 杭州依图医疗技术有限公司 语音指令识别方法、装置、电子设备及存储介质
US20200065374A1 (en) * 2018-08-23 2020-02-27 Shenzhen Keya Medical Technology Corporation Method and system for joint named entity recognition and relation extraction using convolutional neural network
CN111460807A (zh) * 2020-03-13 2020-07-28 平安科技(深圳)有限公司 序列标注方法、装置、计算机设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276066B (zh) * 2018-03-16 2021-07-27 北京国双科技有限公司 实体关联关系的分析方法及相关装置
CN109408812A (zh) * 2018-09-30 2019-03-01 北京工业大学 一种基于注意力机制的序列标注联合抽取实体关系的方法
CN110781683B (zh) * 2019-11-04 2024-04-05 河海大学 一种实体关系联合抽取方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200065374A1 (en) * 2018-08-23 2020-02-27 Shenzhen Keya Medical Technology Corporation Method and system for joint named entity recognition and relation extraction using convolutional neural network
CN110222337A (zh) * 2019-05-28 2019-09-10 浙江邦盛科技有限公司 一种基于transformer和CRF的中文地址分词方法
CN110223742A (zh) * 2019-06-14 2019-09-10 中南大学 中文电子病历数据的临床表现信息抽取方法和设备
CN110287326A (zh) * 2019-07-03 2019-09-27 上海冰鉴信息科技有限公司 一种带背景描述的企业情感分析方法
CN110442840A (zh) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 序列标注网络更新方法、电子病历处理方法及相关装置
CN110827816A (zh) * 2019-11-08 2020-02-21 杭州依图医疗技术有限公司 语音指令识别方法、装置、电子设备及存储介质
CN111460807A (zh) * 2020-03-13 2020-07-28 平安科技(深圳)有限公司 序列标注方法、装置、计算机设备和存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048288A (zh) * 2021-11-10 2022-02-15 北京明略软件系统有限公司 细粒度情感分析方法、系统、计算机设备和存储介质
CN114707467A (zh) * 2022-03-18 2022-07-05 浙江大学 一种基于自注意力机制的自动化拼音转汉字方法
CN114925197A (zh) * 2022-03-28 2022-08-19 中南大学 基于主题注意力的深度学习文本分类模型训练方法
CN114580424A (zh) * 2022-04-24 2022-06-03 之江实验室 一种用于法律文书的命名实体识别的标注方法和装置
CN114580424B (zh) * 2022-04-24 2022-08-05 之江实验室 一种用于法律文书的命名实体识别的标注方法和装置
CN114861601A (zh) * 2022-04-29 2022-08-05 桂林电子科技大学 基于旋转式编码的事件联合抽取方法及存储介质
CN114861601B (zh) * 2022-04-29 2024-04-12 桂林电子科技大学 基于旋转式编码的事件联合抽取方法及存储介质
CN116342964A (zh) * 2023-05-24 2023-06-27 杭州有朋网络技术有限公司 针对于电子商务平台的图片宣传的风控系统及其方法
CN116342964B (zh) * 2023-05-24 2023-08-01 杭州有朋网络技术有限公司 针对于电子商务平台的图片宣传的风控系统及其方法

Also Published As

Publication number Publication date
CN111460807B (zh) 2024-03-12
CN111460807A (zh) 2020-07-28

Similar Documents

Publication Publication Date Title
WO2021179570A1 (zh) 序列标注方法、装置、计算机设备和存储介质
CN112668671B (zh) 预训练模型的获取方法和装置
WO2022134759A1 (zh) 关键词生成方法、装置、电子设备及计算机存储介质
WO2021135910A1 (zh) 基于机器阅读理解的信息抽取方法、及其相关设备
WO2022155994A1 (zh) 基于注意力的深度跨模态哈希检索方法、装置及相关设备
CN110705301B (zh) 实体关系抽取方法及装置、存储介质、电子设备
CN111062215B (zh) 基于半监督学习训练的命名实体识别方法和装置
CN111666427B (zh) 一种实体关系联合抽取方法、装置、设备及介质
CN111401084B (zh) 一种机器翻译的方法、设备以及计算机可读存储介质
WO2019154411A1 (zh) 词向量更新方法和装置
WO2020244065A1 (zh) 基于人工智能的字向量定义方法、装置、设备及存储介质
WO2021027125A1 (zh) 序列标注方法、装置、计算机设备和存储介质
WO2021082086A1 (zh) 机器阅读方法、系统、装置及存储介质
CN114564593A (zh) 多模态知识图谱的补全方法、装置和电子设备
CN112699686A (zh) 基于任务型对话系统的语义理解方法、装置、设备及介质
CN110134965A (zh) 用于信息处理的方法、装置、设备和计算机可读存储介质
US20230215203A1 (en) Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium
WO2022228127A1 (zh) 要素文本处理方法、装置、电子设备和存储介质
Huang et al. An effective multimodal representation and fusion method for multimodal intent recognition
WO2021244099A1 (zh) 语音编辑方法、电子设备及计算机可读存储介质
CN113705207A (zh) 语法错误识别方法及装置
WO2023116572A1 (zh) 一种词句生成方法及相关设备
CN116861363A (zh) 多模态的特征处理方法、装置、存储介质与电子设备
CN114911940A (zh) 文本情感识别方法及装置、电子设备、存储介质
JP2017538226A (ja) スケーラブルなウェブデータの抽出

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924662

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924662

Country of ref document: EP

Kind code of ref document: A1