WO2019196236A1 - Semantic role analysis method, readable storage medium, terminal device and apparatus - Google Patents

Semantic role analysis method, readable storage medium, terminal device and apparatus Download PDF

Info

Publication number
WO2019196236A1
WO2019196236A1 PCT/CN2018/096258 CN2018096258W WO2019196236A1 WO 2019196236 A1 WO2019196236 A1 WO 2019196236A1 CN 2018096258 W CN2018096258 W CN 2018096258W WO 2019196236 A1 WO2019196236 A1 WO 2019196236A1
Authority
WO
WIPO (PCT)
Prior art keywords
participle
vector
speech
word
input matrix
Prior art date
Application number
PCT/CN2018/096258
Other languages
French (fr)
Chinese (zh)
Inventor
张依
汪伟
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019196236A1 publication Critical patent/WO2019196236A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present application belongs to the field of computer technology, and in particular, to a semantic role analysis method, a computer readable storage medium, a terminal device and a device.
  • the mainstream semantic role analysis research mainly focuses on the use of various machine learning techniques, using multiple linguistic features to identify and classify semantic roles.
  • the usual practice is to first use a neural network model to perform the participle of each participle. Determine, and then use a neural network model to determine the semantic role of each word segment. Because in the calculation process, the impact of the whole sentence on the word segmentation result needs to be considered in a single neural network model, the neural network model is often very complicated to construct. The calculation is huge and the efficiency is low.
  • the embodiments of the present application provide a semantic role analysis method, a computer readable storage medium, a terminal device, and a device, so as to solve the current semantic role analysis method, and the entire sentence is determined in a single neural network model.
  • neural network models are often constructed with very complex, computationally intensive and inefficient problems.
  • the first aspect of the embodiment of the present application provides a semantic role analysis method, which may include:
  • the second input matrix of each participle is input into a preset second neural network model to obtain a second output vector of each participle, and the second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis;
  • the part-of-speech vector database is a recorded part of speech type a database of correspondences with part of speech vectors;
  • the third input matrix of each participle is input into a preset third neural network model to obtain a third output vector of each participle, and the third neural network model is a neural network model for performing positive sequence semantic role analysis;
  • the fourth input matrix of each participle is respectively input into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is a neural network model for performing reverse order semantic role analysis;
  • the semantic role type of each word segment is determined according to the third output vector and the fourth output vector of each participle.
  • a second aspect of embodiments of the present application provides a computer readable storage medium storing computer readable instructions that, when executed by a processor, implement the semantic role analysis method described above step.
  • a third aspect of embodiments of the present application provides a semantic role analysis terminal device including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor executing The computer readable instructions implement the steps of the semantic role analysis method described above.
  • a fourth aspect of the embodiments of the present application provides a semantic role analysis apparatus, which may include a module for implementing the steps of the semantic role analysis method described above.
  • the embodiment of the present application splits the originally complicated neural network model into a relatively simple neural network model, and then comprehensively processes the output of each neural network model.
  • the embodiment of the present application splits the originally complicated neural network model into a relatively simple neural network model, and then comprehensively processes the output of each neural network model.
  • FIG. 1 is a flowchart of an embodiment of a semantic role analysis method according to an embodiment of the present application
  • FIG. 2 is a schematic flow chart of a processing procedure of a first neural network model
  • FIG. 3 is a schematic flow chart of a processing procedure of a second neural network model
  • FIG. 4 is a schematic flow chart of a processing procedure of a third neural network model
  • FIG. 5 is a schematic flow chart of a processing procedure of a fourth neural network model
  • FIG. 6 is a structural diagram of an embodiment of a semantic role analysis apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a semantic role analysis terminal device according to an embodiment of the present application.
  • an embodiment of a semantic role analysis method in an embodiment of the present application may include:
  • Step S101 performing word segmentation on the sentence text to obtain each word segment constituting the sentence text.
  • the word processing refers to dividing a sentence text into a single word, that is, each of the word segments.
  • the sentence text can be segmented according to the general dictionary to ensure that the words that are separated are normal words. If the words are not in the dictionary, the words are separated.
  • the current rear direction can be a word, for example, "require God”, it will be divided according to the size of the statistical word frequency. For example, if the word "required” is high, the word “requirement/god” will be separated. / Ask God.”
  • Step S102 Search for a word vector of each word segment in a preset word vector database, and respectively construct a first input matrix and a second input matrix of each word segment according to the word vector.
  • the word vector database is a database for recording a correspondence between words and word vectors, and the word vectors may be corresponding word vectors obtained by training words according to the word2vec model.
  • the first input matrix of each word segment can be separately constructed according to the following formula:
  • n is the serial number of the word segmentation in order of precedence, 1 ⁇ n ⁇ N, N is the total number of word segmentation of the sentence text, cl is the line number of the first input matrix, 1 ⁇ cl ⁇ CoupLen, CoupLen For a preset coupling length, wvl is the column number of the first input matrix, 1 ⁇ wvl ⁇ wVecLen, wVecLen is the length of the word vector of any one of the participles, and the word vector of the nth participle is WordVec n , and
  • WordVec n (WdVecEm n,1 , WdVecEm n,2 ,...,WdVecEm n,vl ,...,WdVecEm n,wVecLen ),
  • FwWdMatrix n is the first input matrix of the nth participle.
  • the second input matrix of each participle is constructed according to the following formula:
  • BkWdMatrix n is the second input matrix of the nth participle.
  • Step S103 the first input matrix of each word segment is separately input into a preset first neural network model, and a first output vector of each word segment is obtained.
  • the first neural network model is a neural network model for performing positive-sequence part-of-speech analysis, and the processing process of the first neural network model may specifically include the steps shown in FIG. 2:
  • step S1031 the first composite vector of each participle is calculated separately.
  • the first composite vector of each participle can be separately calculated according to the following formula:
  • FwWdCpVec n (FwWdCpEm n,1 , FwWdCpEm n,2 ,..., FwWdCpEm n,wvl ,...,FwWdCpEm n,wVecLen )
  • Ln is a natural logarithm function
  • tanh is a hyperbolic tangent function
  • FwWdWt wvl and FwWdWt' wvl are preset weight coefficients.
  • Step S1032 respectively calculating a first probability value of each part of speech type.
  • the first probability value of each part of speech type may be separately calculated according to the following formula:
  • n is the number of the part of speech type, 1 ⁇ m ⁇ M
  • M is the number of part of speech type
  • FwWdWtVec m is the preset weight vector corresponding to the mth part of speech type.
  • FwWdProb n,m is the first probability that the nth participle is the mth part of speech type.
  • step S1033 a first output vector of each participle is constructed.
  • the first output vector of each participle can be constructed according to the following formula:
  • FwWdVec n (FwWdProb n,1 , FwWdProb n,2 ,...,FwWdProb n,m ,...,FwWdProb n,M )
  • FwWdVec n is the first output vector of the nth participle.
  • Step S104 the second input matrix of each word segment is separately input into a preset second neural network model, and a second output vector of each word segment is obtained.
  • the second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis, and the processing process of the second neural network model may specifically include the steps shown in FIG. 3:
  • step S1041 the second composite vector of each participle is calculated separately.
  • the second composite vector of each participle can be separately calculated according to the following formula:
  • BkWdCpVec n (BkWdCpEm n,1 , BkWdCpEm n,2 ,..., BkWdCpEm n,wvl ,..., BkWdCpEm n,wVecLen )
  • BkWdWt wvl and BkWdWt' wvl are preset weight coefficients.
  • Step S1042 respectively calculating a second probability value of each part of speech type.
  • the second probability value of each part of speech type may be separately calculated according to the following formula:
  • BkWdWtVec m is a preset weight vector corresponding to the mth part of speech type
  • BkWdProb n,m is a second probability value that the nth participle is the mth part of speech type.
  • Step S1043 constructing a second output vector of each participle.
  • the second output vector of each participle can be constructed according to the following formula:
  • BkWdVec n (BkWdProb n,1 , BkWdProb n,2 ,..., BkWdProb n,m ,...,BkWdProb n,M )
  • BkWdVec n is the second output vector of the nth participle.
  • Step S105 determining the part of speech type of each participle according to the first output vector and the second output vector of each participle.
  • the part-of-speech probability vector of each participle can be calculated according to the following formula:
  • WdProbVec n (WdProb n,1 , WdProb n,2 ,...,WdProb n,m ,...,WdProb n,M )
  • WdProb n, m ⁇ 1 * FwWdProb n, m + ⁇ 2 * BkWdProb n, m , ⁇ 1 , ⁇ 2 are preset weight coefficients, and WdProbVec n is the part-of-sense probability vector of the nth participle.
  • argmax is the largest independent variable function and CharSeq n is the part-of-speech type number of the nth participle. It is also determined that the part of speech type corresponding to the element having the largest value in the part-of-speech probability vector of the nth participle is determined as the part of speech type of the nth participle.
  • Step S106 Search for a part-of-speech vector corresponding to the part-of-speech type of each participle in the preset part-of-speech vector database, and construct a third input matrix and a fourth input matrix of each participle according to the part-of-speech vector respectively.
  • the part of speech vector database is a database for recording the correspondence between the part of speech type and the part of speech vector.
  • the part of speech vector is a vector form corresponding to each part of speech type, that is, the probability of occurrence of the part of speech type is represented according to context information of the part of speech type.
  • the training of part-of-speech vectors first expresses each part of speech type into a 0-1 vector (one-hot) form, and then performs model training, using the part of speech type of n-1 words to predict the part of speech type of the nth word, neural network
  • the intermediate process obtained after the model is predicted is used as a part of speech vector.
  • the one-hot vector of the part-of-speech type "noun” is set to [1, 0, 0, 0, ..., 0]
  • the one-hot vector of the part of speech type "adjective” is [0, 1, 0. ,0,...,0]
  • the one-hot vector of the part of speech type "verb” is [0,0,1,0,...,0]
  • the model is trained to generate a coefficient matrix W of the hidden layer
  • the product of the one-hot vector of each part of speech type and the coefficient matrix is the part of speech vector of the part of speech type, and the final form will be similar to "[ A multidimensional vector such as -0.11, 0.26, -0.03, ..., 0.71]".
  • the third input matrix of each word segment can be separately constructed according to the following formula:
  • n is the sequence number of the word segmentation in order of precedence, 1 ⁇ n ⁇ N, N is the total number of word segmentation of the sentence text, cl is the line number of the third input matrix, 1 ⁇ cl ⁇ CoupLen, CoupLen For a preset coupling length, cvl is the column number of the third input matrix, 1 ⁇ cvl ⁇ cVecLen, cVecLen is the length of the part of speech vector of any one of the participles, and the part of speech vector of the nth participle is CharVec n , and
  • CharVec n (CrVecEm n,1 ,CrVecEm n,2 ,...,CrVecEm n,cvl ,...,CrVecEm n,cVecLen ),
  • FwCrMatrix n is the third input matrix of the nth participle.
  • the fourth input matrix of each participle is constructed according to the following formula:
  • BkCrMatrix n is the fourth input matrix of the nth participle.
  • Step S107 the third input matrix of each participle is respectively input into a preset third neural network model, and a third output vector of each participle is obtained.
  • the third neural network model is a neural network model for performing positive sequence semantic role analysis, and the processing process of the third neural network model may specifically include the steps shown in FIG. 4:
  • Step S1071 respectively calculating a third composite vector of each participle.
  • the third composite vector of each participle can be separately calculated according to the following formula:
  • FwCrCpVec n (FwCrCpEm n,1 , FwCrCpEm n,2 , . . . , FwCrCpEm n, vl , . . . , FwCrCpEm n, cVecLen )
  • Ln is a natural logarithm function
  • tanh is a hyperbolic tangent function
  • FwCrWt cvl and FwCrWt' cvl are preset weight coefficients.
  • Step S1072 respectively calculating a first probability value of each semantic role type.
  • the first probability value of each semantic role type may be separately calculated according to the following formula:
  • l is the sequence number of the semantic role type, 1 ⁇ l ⁇ L, L is the number of semantic role types, and FwCrWtVec l is the preset weight vector corresponding to the first semantic role type.
  • FwCrProb n,l is the first probability that the nth participle is the first semantic role type.
  • step S1073 a third output vector of each participle is constructed.
  • the third output vector of each participle can be constructed according to the following formula:
  • FwCrVec n (FwCrProb n,1 , FwCrProb n,2 ,...,FwCrProb n,l ,...,FwCrProb n,L )
  • FwCrVec n is the third output vector of the nth participle.
  • Step S108 the fourth input matrix of each word segment is separately input into a preset fourth neural network model, and a fourth output vector of each word segment is obtained.
  • the fourth neural network model is a neural network model for performing reverse-sequence semantic role analysis, and the processing process of the third neural network model may specifically include the steps shown in FIG. 5:
  • step S1081 the fourth composite vector of each participle is calculated separately.
  • the fourth composite vector of each participle can be separately calculated according to the following formula:
  • BkCrCpVec n (BkCrCpEm n,1 , BkCrCpEm n,2 , . . . , BkCrCpEm n, cvl , . . . , BkCrCpEm n, cVecLen ), wherein
  • BkCrWt cvl and BkCrWt' cvl are preset weight coefficients.
  • Step S1082 respectively calculating a second probability value of each semantic role type.
  • the second probability value of each semantic role type may be separately calculated according to the following formula:
  • BkCrWtVec l is a preset weight vector corresponding to the first semantic role type
  • BkCrProb n, l is the second probability that the nth participle is the first semantic role type.
  • Step S1083 constructing a fourth output vector of each participle.
  • the fourth output vector of each participle can be constructed according to the following formula:
  • BkCrVec n (BkCrProb n,1 , BkCrProb n,2 ,...,BkCrProb n,l ,...,BkCrProb n,L )
  • BkCrVec n is the fourth output vector of the nth participle.
  • Step S109 determining a semantic role type of each word segment according to the third output vector and the fourth output vector of each participle.
  • semantic role probability vector of each word segment can be separately calculated according to the following formula:
  • CrProbVec n (CrProb n,1 ,CrProb n,2 ,...,CrProb n,l ,...,CrProb n,L )
  • CrProb n,l ⁇ 1 *FwCrProb n,l + ⁇ 2 *BkCrProb n,l , ⁇ 1 , ⁇ 2 are preset weight coefficients, and CrProbVec n is the semantic role probability vector of the nth participle.
  • argmax is the largest independent variable function and RoleSeq n is the semantic role type number of the nth participle. It is also determined that the semantic role type corresponding to the element with the largest value among the semantic role probability vectors of the nth participle is determined as the semantic role type of the nth participle. It is also determined that the semantic role type corresponding to the element with the largest value among the semantic role probability vectors of the nth participle is determined as the semantic role type of the nth participle.
  • the two embodiments of the present application use two neural network models for processing in the two most critical processes, and the previously complex neural network model is divided into relatively simple neural network models, and then The output of each neural network model is processed comprehensively to obtain the result. Due to the simplification of the neural network model structure, the calculation amount is greatly reduced, and the analysis efficiency is improved.
  • FIG. 6 is a structural diagram of an embodiment of a semantic role analysis apparatus provided by an embodiment of the present application.
  • a semantic role analysis apparatus may include:
  • the word processing module 601 is configured to perform word segmentation on the sentence text to obtain each word segment constituting the text of the sentence;
  • the word vector searching module 602 is configured to separately search for a word vector of each word segment in a preset word vector database, where the word vector database is a database for recording a correspondence between words and word vectors;
  • a word vector matrix construction module 603 configured to respectively construct a first input matrix and a second input matrix of each word segment according to the word vector;
  • the first processing module 604 is configured to input the first input matrix of each word segment into the preset first neural network model to obtain a first output vector of each word segment, where the first neural network model performs positive sequence word Analytical neural network model;
  • the second processing module 605 is configured to input the second input matrix of each participle into the preset second neural network model to obtain a second output vector of each participle, and the second neural network model is to perform reverse word analysis.
  • Neural network model
  • the part of speech type determining module 606 is configured to determine a part of speech type of each participle according to the first output vector and the second output vector of each participle;
  • the part of speech vector search module 607 is configured to search for a part of speech vector corresponding to the part of speech type of each participle in a preset part of speech vector database, where the part of speech vector database is a database for recording the correspondence between the part of speech type and the part of speech vector;
  • the part of speech vector matrix construction module 608 is configured to respectively construct a third input matrix and a fourth input matrix of each word segment according to the part of speech vector;
  • the third processing module 609 is configured to input the third input matrix of each word segment into the preset third neural network model to obtain a third output vector of each word segment, and the third neural network model performs positive sequence semantics Neural network model for role analysis;
  • the fourth processing module 610 is configured to input the fourth input matrix of each participle into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is to perform a reverse order semantic role.
  • Analytical neural network model ;
  • the semantic role type determining module 611 is configured to determine a semantic role type of each word segment according to the third output vector and the fourth output vector of each word segment.
  • the specific embodiment of the semantic role analysis device is substantially the same as the foregoing embodiments of the semantic role analysis method. Reference may be made to the related description in the foregoing method embodiments, and details are not described herein.
  • FIG. 7 is a schematic block diagram of a semantic role analysis terminal device provided by an embodiment of the present application. For convenience of description, only parts related to the embodiment of the present application are shown.
  • the semantic role analysis terminal device 7 may be a computing device such as a mobile phone, a tablet computer, a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the semantic role analysis terminal device 7 may include a processor 70, a memory 71, and computer readable instructions 72 stored in the memory 71 and executable on the processor 70, such as performing the semantic role analysis method described above.
  • the processor 70 executes the steps in the embodiments of the various semantic role analysis methods described above when the computer readable instructions 72 are executed.
  • the functional units in the various embodiments of the present application may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of computer readable instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.

Abstract

The present application belongs to the technical field of computers, and relates in particular to a semantic role analysis method, a computer-readable storage medium, a terminal device and an apparatus. The method comprises: in the process of part-of-speech analysis, one neural network model is used for forward part-of-speech analysis, and another neural network model is used for reverse part-of-speech analysis. In the process of semantic role analysis, one neural network model is used for forward semantic role analysis, and another neural network model is used for reverse semantic role analysis. Thus, an originally complex neural network model is divided into relatively simple neural network models, and then the output of each neural network model is comprehensively processed to obtain a result. Due to the simplification of the neural network model structure, the computational load is greatly reduced and the analysis efficiency is improved.

Description

语义角色分析方法、可读存储介质、终端设备及装置Semantic role analysis method, readable storage medium, terminal device and device
本申请要求于2018年4月9日提交中国专利局、申请号为201810309685.6、发明名称为“一种语义角色分析方法、计算机可读存储介质及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 201101309685.6, entitled "Semantic Role Analysis Method, Computer Readable Storage Media, and Terminal Equipment", filed on April 9, 2018, the entire disclosure of which is hereby incorporated by reference. The content is incorporated herein by reference.
技术领域Technical field
本申请属于计算机技术领域,尤其涉及一种语义角色分析方法、计算机可读存储介质、终端设备及装置。The present application belongs to the field of computer technology, and in particular, to a semantic role analysis method, a computer readable storage medium, a terminal device and a device.
背景技术Background technique
目前,主流的语义角色分析研究主要集中于使用各种机器学习技术,利用多种语言学特征,进行了语义角色的识别和分类,通常的做法是首先使用一个神经网络模型进行各个分词的词性的确定,然后再通过一个神经网络模型进行各个分词的语义角色的确定,由于在计算过程中,需要在单个的神经网络模型中考虑整个语句对分词判定结果的影响,神经网络模型往往构造的非常复杂,计算量巨大,效率低下。At present, the mainstream semantic role analysis research mainly focuses on the use of various machine learning techniques, using multiple linguistic features to identify and classify semantic roles. The usual practice is to first use a neural network model to perform the participle of each participle. Determine, and then use a neural network model to determine the semantic role of each word segment. Because in the calculation process, the impact of the whole sentence on the word segmentation result needs to be considered in a single neural network model, the neural network model is often very complicated to construct. The calculation is huge and the efficiency is low.
技术问题technical problem
有鉴于此,本申请实施例提供了一种语义角色分析方法、计算机可读存储介质、终端设备及装置,以解决目前的语义角色分析方法需要在单个的神经网络模型中考虑整个语句对分词判定结果的影响,神经网络模型往往构造的非常复杂,计算量巨大,效率低下的问题。In view of this, the embodiments of the present application provide a semantic role analysis method, a computer readable storage medium, a terminal device, and a device, so as to solve the current semantic role analysis method, and the entire sentence is determined in a single neural network model. As a result of the impact, neural network models are often constructed with very complex, computationally intensive and inefficient problems.
技术解决方案Technical solution
本申请实施例的第一方面提供了一种语义角色分析方法,可以包括:The first aspect of the embodiment of the present application provides a semantic role analysis method, which may include:
对语句文本进行切词处理,得到构成所述语句文本的各个分词;Performing word-cutting on the sentence text to obtain each participle constituting the text of the sentence;
在预设的词向量数据库中分别查找各个分词的词向量,并根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵,所述词向量数据库为记录词语与词向量之间的对应关系的数据库;Searching for a word vector of each word segment in a preset word vector database, and respectively constructing a first input matrix and a second input matrix of each word segment according to the word vector, wherein the word vector database is between the record word and the word vector a database of correspondences;
将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量,所述第一神经网络模型为进行正序词性分析的神经网络模型;Inputting a first input matrix of each participle into a preset first neural network model to obtain a first output vector of each participle, the first neural network model being a neural network model for performing positive-sequence part-of-speech analysis;
将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量,所述第二神经网络模型为进行逆序词性分析的神经网络模型;The second input matrix of each participle is input into a preset second neural network model to obtain a second output vector of each participle, and the second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis;
根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型;Determining the part of speech type of each participle according to the first output vector and the second output vector of each participle;
在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,并根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵,所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库;Searching for a part-of-speech vector corresponding to the part-of-speech type of each participle in a preset part of speech vector database, and constructing a third input matrix and a fourth input matrix of each participle according to the part-of-speech vector, the part-of-speech vector database is a recorded part of speech type a database of correspondences with part of speech vectors;
将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量,所述第三神经网络模型为进行正序语义角色分析的神经网络模型;The third input matrix of each participle is input into a preset third neural network model to obtain a third output vector of each participle, and the third neural network model is a neural network model for performing positive sequence semantic role analysis;
将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量,所述第四神经网络模型为进行逆序语义角色分析的神经网络模型;The fourth input matrix of each participle is respectively input into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is a neural network model for performing reverse order semantic role analysis;
根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。The semantic role type of each word segment is determined according to the third output vector and the fourth output vector of each participle.
本申请实施例的第二方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述语义角色分析方法的步骤。A second aspect of embodiments of the present application provides a computer readable storage medium storing computer readable instructions that, when executed by a processor, implement the semantic role analysis method described above step.
本申请实施例的第三方面提供了一种语义角色分析终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述语义角色分析方法的步骤。A third aspect of embodiments of the present application provides a semantic role analysis terminal device including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor executing The computer readable instructions implement the steps of the semantic role analysis method described above.
本申请实施例的第四方面提供了一种语义角色分析装置,可以包括用于实现上述语义角色分析方法的步骤的模块。A fourth aspect of the embodiments of the present application provides a semantic role analysis apparatus, which may include a module for implementing the steps of the semantic role analysis method described above.
有益效果Beneficial effect
本申请实施例与现有技术相比存在的有益效果是:本申请实施例将原本较为复杂的神经网络模型拆分为相对简单的神经网络模型,再对各个神经网络模型的输出进行综合处理得到结果,由于神经网络模型结构的简化,大大减少了计算量,提升了分析效率。The beneficial effects of the embodiment of the present application compared with the prior art are: the embodiment of the present application splits the originally complicated neural network model into a relatively simple neural network model, and then comprehensively processes the output of each neural network model. As a result, due to the simplification of the neural network model structure, the amount of calculation is greatly reduced, and the analysis efficiency is improved.
附图说明DRAWINGS
图1为本申请实施例中一种语义角色分析方法的一个实施例流程图;FIG. 1 is a flowchart of an embodiment of a semantic role analysis method according to an embodiment of the present application;
图2为第一神经网络模型的处理过程的示意流程图;2 is a schematic flow chart of a processing procedure of a first neural network model;
图3为第二神经网络模型的处理过程的示意流程图;3 is a schematic flow chart of a processing procedure of a second neural network model;
图4为第三神经网络模型的处理过程的示意流程图;4 is a schematic flow chart of a processing procedure of a third neural network model;
图5为第四神经网络模型的处理过程的示意流程图;5 is a schematic flow chart of a processing procedure of a fourth neural network model;
图6为本申请实施例中一种语义角色分析装置的一个实施例结构图;FIG. 6 is a structural diagram of an embodiment of a semantic role analysis apparatus according to an embodiment of the present application;
图7为本申请实施例中一种语义角色分析终端设备的示意框图。FIG. 7 is a schematic block diagram of a semantic role analysis terminal device according to an embodiment of the present application.
本发明的实施方式Embodiments of the invention
请参阅图1,本申请实施例中一种语义角色分析方法的一个实施例可以包括:Referring to FIG. 1, an embodiment of a semantic role analysis method in an embodiment of the present application may include:
步骤S101,对语句文本进行切词处理,得到构成所述语句文本的各个分词。Step S101, performing word segmentation on the sentence text to obtain each word segment constituting the sentence text.
切词处理是指将一个语句文本切分成一个一个单独的词,也即各个所述分词,在本实施例中,可以根据通用词典对语句文本进行切分,保证分出的词语都是正常词汇,如词语不在词典内则分出单字。当前后方向都可以成词时,例如“要求神”,会根据统计词频的大小划分,如“要求”词频高则分出“要求/神”,如“求神”词频高则分出“要/求神”。The word processing refers to dividing a sentence text into a single word, that is, each of the word segments. In this embodiment, the sentence text can be segmented according to the general dictionary to ensure that the words that are separated are normal words. If the words are not in the dictionary, the words are separated. When the current rear direction can be a word, for example, "require God", it will be divided according to the size of the statistical word frequency. For example, if the word "required" is high, the word "requirement/god" will be separated. / Ask God."
步骤S102,在预设的词向量数据库中分别查找各个分词的词向量,并根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵。Step S102: Search for a word vector of each word segment in a preset word vector database, and respectively construct a first input matrix and a second input matrix of each word segment according to the word vector.
所述词向量数据库为记录词语与词向量之间的对应关系的数据库,所述词向量可以是根据word2vec模型训练词语所得到对应的词向量。The word vector database is a database for recording a correspondence between words and word vectors, and the word vectors may be corresponding word vectors obtained by training words according to the word2vec model.
具体地,可以根据下式分别构建各个分词的第一输入矩阵:Specifically, the first input matrix of each word segment can be separately constructed according to the following formula:
Figure PCTCN2018096258-appb-000001
Figure PCTCN2018096258-appb-000001
其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第一输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,wvl为所述第一输入矩阵的列号,1≤wvl≤wVecLen,wVecLen为任意一个所述分词的词向量的长度,第n个分词的词向量为WordVec n,且 Where n is the serial number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the first input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, wvl is the column number of the first input matrix, 1≤wvl≤wVecLen, wVecLen is the length of the word vector of any one of the participles, and the word vector of the nth participle is WordVec n , and
WordVec n=(WdVecEm n,1,WdVecEm n,2,......,WdVecEm n,vl,......,WdVecEm n,wVecLen), WordVec n = (WdVecEm n,1 , WdVecEm n,2 ,...,WdVecEm n,vl ,...,WdVecEm n,wVecLen ),
Figure PCTCN2018096258-appb-000002
FwWdMatrix n为第n个分词的第一输入矩阵。
Figure PCTCN2018096258-appb-000002
FwWdMatrix n is the first input matrix of the nth participle.
根据下式分别构建各个分词的第二输入矩阵:The second input matrix of each participle is constructed according to the following formula:
Figure PCTCN2018096258-appb-000003
Figure PCTCN2018096258-appb-000003
Figure PCTCN2018096258-appb-000004
BkWdMatrix n为第n个分词的第二输入矩阵。
Figure PCTCN2018096258-appb-000004
BkWdMatrix n is the second input matrix of the nth participle.
步骤S103,将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量。Step S103, the first input matrix of each word segment is separately input into a preset first neural network model, and a first output vector of each word segment is obtained.
所述第一神经网络模型为进行正序词性分析的神经网络模型,所述第一神经网络模型的处理过程具体可以包括如图2所示的步骤:The first neural network model is a neural network model for performing positive-sequence part-of-speech analysis, and the processing process of the first neural network model may specifically include the steps shown in FIG. 2:
步骤S1031,分别计算各个分词的第一复合向量。In step S1031, the first composite vector of each participle is calculated separately.
具体地,可以根据下式分别计算各个分词的第一复合向量:Specifically, the first composite vector of each participle can be separately calculated according to the following formula:
FwWdCpVec n=(FwWdCpEm n,1,FwWdCpEm n,2,......,FwWdCpEm n,wvl,......,FwWdCpEm n,wVecLen)其中, FwWdCpVec n = (FwWdCpEm n,1 , FwWdCpEm n,2 ,..., FwWdCpEm n,wvl ,...,FwWdCpEm n,wVecLen ) where
Figure PCTCN2018096258-appb-000005
Figure PCTCN2018096258-appb-000005
ln为自然对数函数,tanh为双曲正切函数,FwWdWt wvl、FwWdWt′ wvl均为预设的权重系数。 Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwWdWt wvl and FwWdWt' wvl are preset weight coefficients.
步骤S1032,分别计算各个词性类型的第一概率值。Step S1032, respectively calculating a first probability value of each part of speech type.
具体地,可以根据下式分别计算各个词性类型的第一概率值:Specifically, the first probability value of each part of speech type may be separately calculated according to the following formula:
Figure PCTCN2018096258-appb-000006
Figure PCTCN2018096258-appb-000006
其中,m为词性类型的序号,1≤m≤M,M为词性类型的个数,FwWdWtVec m为预设的与第m个词性类型对应的权值向量,
Figure PCTCN2018096258-appb-000007
FwWdProb n,m为第n个分词是第m个词性类型的第一概率值。
Where m is the number of the part of speech type, 1≤m≤M, M is the number of part of speech type, and FwWdWtVec m is the preset weight vector corresponding to the mth part of speech type.
Figure PCTCN2018096258-appb-000007
FwWdProb n,m is the first probability that the nth participle is the mth part of speech type.
步骤S1033,构建各个分词的第一输出向量。In step S1033, a first output vector of each participle is constructed.
具体地,可以根据下式构建各个分词的第一输出向量:Specifically, the first output vector of each participle can be constructed according to the following formula:
FwWdVec n=(FwWdProb n,1,FwWdProb n,2,......,FwWdProb n,m,......,FwWdProb n,M) FwWdVec n = (FwWdProb n,1 , FwWdProb n,2 ,...,FwWdProb n,m ,...,FwWdProb n,M )
其中,FwWdVec n为第n个分词的第一输出向量。 Where FwWdVec n is the first output vector of the nth participle.
步骤S104,将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量。Step S104, the second input matrix of each word segment is separately input into a preset second neural network model, and a second output vector of each word segment is obtained.
所述第二神经网络模型为进行逆序词性分析的神经网络模型,所述第二神经网络模型的处理过程具体可以包括如图3所示的步骤:The second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis, and the processing process of the second neural network model may specifically include the steps shown in FIG. 3:
步骤S1041,分别计算各个分词的第二复合向量。In step S1041, the second composite vector of each participle is calculated separately.
具体地,可以根据下式分别计算各个分词的第二复合向量:Specifically, the second composite vector of each participle can be separately calculated according to the following formula:
BkWdCpVec n=(BkWdCpEm n,1,BkWdCpEm n,2,......,BkWdCpEm n,wvl,......,BkWdCpEm n,wVecLen)其中, BkWdCpVec n = (BkWdCpEm n,1 , BkWdCpEm n,2 ,..., BkWdCpEm n,wvl ,..., BkWdCpEm n,wVecLen ) where
Figure PCTCN2018096258-appb-000008
Figure PCTCN2018096258-appb-000008
BkWdWt wvl、BkWdWt′ wvl均为预设的权重系数。 BkWdWt wvl and BkWdWt' wvl are preset weight coefficients.
步骤S1042,分别计算各个词性类型的第二概率值。Step S1042, respectively calculating a second probability value of each part of speech type.
具体地,可以根据下式分别计算各个词性类型的第二概率值:Specifically, the second probability value of each part of speech type may be separately calculated according to the following formula:
Figure PCTCN2018096258-appb-000009
Figure PCTCN2018096258-appb-000009
其中,BkWdWtVec m为预设的与第m个词性类型对应的权值向量,BkWdProb n,m为第n个分词是第m个词性类型的第二概率值。 Wherein, BkWdWtVec m is a preset weight vector corresponding to the mth part of speech type, and BkWdProb n,m is a second probability value that the nth participle is the mth part of speech type.
步骤S1043,构建各个分词的第二输出向量。Step S1043, constructing a second output vector of each participle.
具体地,可以根据下式构建各个分词的第二输出向量:Specifically, the second output vector of each participle can be constructed according to the following formula:
BkWdVec n=(BkWdProb n,1,BkWdProb n,2,......,BkWdProb n,m,......,BkWdProb n,M) BkWdVec n = (BkWdProb n,1 , BkWdProb n,2 ,..., BkWdProb n,m ,...,BkWdProb n,M )
其中,BkWdVec n为第n个分词的第二输出向量。 Where BkWdVec n is the second output vector of the nth participle.
步骤S105,根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型。Step S105, determining the part of speech type of each participle according to the first output vector and the second output vector of each participle.
具体地,可以根据下式分别计算各个分词的词性概率向量:Specifically, the part-of-speech probability vector of each participle can be calculated according to the following formula:
WdProbVec n=(WdProb n,1,WdProb n,2,......,WdProb n,m,......,WdProb n,M) WdProbVec n = (WdProb n,1 , WdProb n,2 ,...,WdProb n,m ,...,WdProb n,M )
其中,WdProb n,m=η 1*FwWdProb n,m2*BkWdProb n,m,η 1、η 2均为预设的权重系数,WdProbVec n为第n个分词的词性概率向量。 Wherein, WdProb n, m = η 1 * FwWdProb n, m + η 2 * BkWdProb n, m , η 1 , η 2 are preset weight coefficients, and WdProbVec n is the part-of-sense probability vector of the nth participle.
根据下式分别确定各个分词的词性类型:Determine the part of speech of each participle according to the following formula:
CharSeq n=argmax(WdProbVec n) CharSeq n =argmax(WdProbVec n )
其中,argmax为最大自变量函数,CharSeq n为第n个分词的词性类型序号。也即将第n个分词的词性概率向量中取值最大的元素所对应的词性类型确定为第n个分词的词性类型。 Where argmax is the largest independent variable function and CharSeq n is the part-of-speech type number of the nth participle. It is also determined that the part of speech type corresponding to the element having the largest value in the part-of-speech probability vector of the nth participle is determined as the part of speech type of the nth participle.
步骤S106,在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,并根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵。Step S106: Search for a part-of-speech vector corresponding to the part-of-speech type of each participle in the preset part-of-speech vector database, and construct a third input matrix and a fourth input matrix of each participle according to the part-of-speech vector respectively.
所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库。所述词性向量为各个词性类型所对应的向量形式,即根据词性类型的上下文信息来表示该词性类型出现的概率。词性向量的训练先将每个词性类型表示成一个0-1向量(one-hot)形式,再进行模型训练,用n-1个词的词性类型来预测第n个词的词性类型,神经网络模型预测后得到的中间过程作为词性向量。具体地,如词性类型“名词”的one-hot向量假设定为[1,0,0,0,……,0],词性类型“形容词”的one-hot向量为[0,1,0,0,……,0],词性类型“动词”的one-hot向量为[0,0,1,0,……,0],词性类型“副词”的向量[0,0,0,1,……,0],模型经过训练会生成隐藏层的系数矩阵W,每个词性类型的one-hot向量和系数矩阵的乘积为该词性类型的词性向量,最后的形式将是类似于“[-0.11,0.26,-0.03,…...,0.71]”这样的一个多维向量。The part of speech vector database is a database for recording the correspondence between the part of speech type and the part of speech vector. The part of speech vector is a vector form corresponding to each part of speech type, that is, the probability of occurrence of the part of speech type is represented according to context information of the part of speech type. The training of part-of-speech vectors first expresses each part of speech type into a 0-1 vector (one-hot) form, and then performs model training, using the part of speech type of n-1 words to predict the part of speech type of the nth word, neural network The intermediate process obtained after the model is predicted is used as a part of speech vector. Specifically, the one-hot vector of the part-of-speech type "noun" is set to [1, 0, 0, 0, ..., 0], and the one-hot vector of the part of speech type "adjective" is [0, 1, 0. ,0,...,0], the one-hot vector of the part of speech type "verb" is [0,0,1,0,...,0], the vector of the part of speech type "adverb"[0,0,0,1 , ..., 0], the model is trained to generate a coefficient matrix W of the hidden layer, the product of the one-hot vector of each part of speech type and the coefficient matrix is the part of speech vector of the part of speech type, and the final form will be similar to "[ A multidimensional vector such as -0.11, 0.26, -0.03, ..., 0.71]".
具体地,可以根据下式分别构建各个分词的第三输入矩阵:Specifically, the third input matrix of each word segment can be separately constructed according to the following formula:
Figure PCTCN2018096258-appb-000010
Figure PCTCN2018096258-appb-000010
其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第三输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,cvl为所述第三输入矩阵的列号,1≤cvl≤cVecLen,cVecLen为任意一个所述分词的词性向量的长度,第n个分词的词性向量为CharVec n,且 Where n is the sequence number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the third input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, cvl is the column number of the third input matrix, 1≤cvl≤cVecLen, cVecLen is the length of the part of speech vector of any one of the participles, and the part of speech vector of the nth participle is CharVec n , and
CharVec n=(CrVecEm n,1,CrVecEm n,2,......,CrVecEm n,cvl,......,CrVecEm n,cVecLen), CharVec n = (CrVecEm n,1 ,CrVecEm n,2 ,...,CrVecEm n,cvl ,...,CrVecEm n,cVecLen ),
Figure PCTCN2018096258-appb-000011
FwCrMatrix n为第n个分词的第三输入矩阵。
Figure PCTCN2018096258-appb-000011
FwCrMatrix n is the third input matrix of the nth participle.
根据下式分别构建各个分词的第四输入矩阵:The fourth input matrix of each participle is constructed according to the following formula:
Figure PCTCN2018096258-appb-000012
Figure PCTCN2018096258-appb-000012
Figure PCTCN2018096258-appb-000013
BkCrMatrix n为第n个分词的第四输入矩阵。
Figure PCTCN2018096258-appb-000013
BkCrMatrix n is the fourth input matrix of the nth participle.
步骤S107,将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量。Step S107, the third input matrix of each participle is respectively input into a preset third neural network model, and a third output vector of each participle is obtained.
所述第三神经网络模型为进行正序语义角色分析的神经网络模型,所述第三神经网络模型的处理过程具体可以包括如图4所示的步骤:The third neural network model is a neural network model for performing positive sequence semantic role analysis, and the processing process of the third neural network model may specifically include the steps shown in FIG. 4:
步骤S1071,分别计算各个分词的第三复合向量。Step S1071, respectively calculating a third composite vector of each participle.
具体地,可以根据下式分别计算各个分词的第三复合向量:Specifically, the third composite vector of each participle can be separately calculated according to the following formula:
FwCrCpVec n=(FwCrCpEm n,1,FwCrCpEm n,2,......,FwCrCpEm n,vl,......,FwCrCpEm n,cVecLen)其中, FwCrCpVec n = (FwCrCpEm n,1 , FwCrCpEm n,2 , . . . , FwCrCpEm n, vl , . . . , FwCrCpEm n, cVecLen )
Figure PCTCN2018096258-appb-000014
Figure PCTCN2018096258-appb-000014
ln为自然对数函数,tanh为双曲正切函数,FwCrWt cvl、FwCrWt′ cvl均为预设的权重系数。 Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwCrWt cvl and FwCrWt' cvl are preset weight coefficients.
步骤S1072,分别计算各个语义角色类型的第一概率值。Step S1072, respectively calculating a first probability value of each semantic role type.
具体地,可以根据下式分别计算各个语义角色类型的第一概率值:Specifically, the first probability value of each semantic role type may be separately calculated according to the following formula:
Figure PCTCN2018096258-appb-000015
Figure PCTCN2018096258-appb-000015
其中,l为语义角色类型的序号,1≤l≤L,L为语义角色类型的个数,FwCrWtVec l为预设的与第l个语义角色类型对应的权值向量,
Figure PCTCN2018096258-appb-000016
FwCrProb n,l为第n个分词是第l个语义角色类型的第一概率值。
Where l is the sequence number of the semantic role type, 1≤l≤L, L is the number of semantic role types, and FwCrWtVec l is the preset weight vector corresponding to the first semantic role type.
Figure PCTCN2018096258-appb-000016
FwCrProb n,l is the first probability that the nth participle is the first semantic role type.
步骤S1073,构建各个分词的第三输出向量。In step S1073, a third output vector of each participle is constructed.
具体地,可以根据下式构建各个分词的第三输出向量:Specifically, the third output vector of each participle can be constructed according to the following formula:
FwCrVec n=(FwCrProb n,1,FwCrProb n,2,......,FwCrProb n,l,......,FwCrProb n,L) FwCrVec n = (FwCrProb n,1 , FwCrProb n,2 ,...,FwCrProb n,l ,...,FwCrProb n,L )
其中,FwCrVec n为第n个分词的第三输出向量。 Where FwCrVec n is the third output vector of the nth participle.
步骤S108,将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量。Step S108, the fourth input matrix of each word segment is separately input into a preset fourth neural network model, and a fourth output vector of each word segment is obtained.
所述第四神经网络模型为进行逆序语义角色分析的神经网络模型,所述第三神经网络模型的处理过程具体可以包括如图5所示的步骤:The fourth neural network model is a neural network model for performing reverse-sequence semantic role analysis, and the processing process of the third neural network model may specifically include the steps shown in FIG. 5:
步骤S1081,分别计算各个分词的第四复合向量。In step S1081, the fourth composite vector of each participle is calculated separately.
具体地,可以根据下式分别计算各个分词的第四复合向量:Specifically, the fourth composite vector of each participle can be separately calculated according to the following formula:
BkCrCpVec n=(BkCrCpEm n,1,BkCrCpEm n,2,......,BkCrCpEm n,cvl,......,BkCrCpEm n,cVecLen),其中, BkCrCpVec n = (BkCrCpEm n,1 , BkCrCpEm n,2 , . . . , BkCrCpEm n, cvl , . . . , BkCrCpEm n, cVecLen ), wherein
Figure PCTCN2018096258-appb-000017
Figure PCTCN2018096258-appb-000017
BkCrWt cvl、BkCrWt′ cvl均为预设的权重系数。 BkCrWt cvl and BkCrWt' cvl are preset weight coefficients.
步骤S1082,分别计算各个语义角色类型的第二概率值。Step S1082, respectively calculating a second probability value of each semantic role type.
具体地,可以根据下式分别计算各个语义角色类型的第二概率值:Specifically, the second probability value of each semantic role type may be separately calculated according to the following formula:
Figure PCTCN2018096258-appb-000018
Figure PCTCN2018096258-appb-000018
其中,BkCrWtVec l为预设的与第l个语义角色类型对应的权值向量,BkCrProb n,l为第n个分 词是第l个语义角色类型的第二概率值。 Where BkCrWtVec l is a preset weight vector corresponding to the first semantic role type, and BkCrProb n, l is the second probability that the nth participle is the first semantic role type.
步骤S1083,构建各个分词的第四输出向量。Step S1083, constructing a fourth output vector of each participle.
具体地,可以根据下式构建各个分词的第四输出向量:Specifically, the fourth output vector of each participle can be constructed according to the following formula:
BkCrVec n=(BkCrProb n,1,BkCrProb n,2,......,BkCrProb n,l,......,BkCrProb n,L) BkCrVec n = (BkCrProb n,1 , BkCrProb n,2 ,...,BkCrProb n,l ,...,BkCrProb n,L )
其中,BkCrVec n为第n个分词的第四输出向量。 Where BkCrVec n is the fourth output vector of the nth participle.
步骤S109,根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。Step S109, determining a semantic role type of each word segment according to the third output vector and the fourth output vector of each participle.
具体地,可以根据下式分别计算各个分词的语义角色概率向量:Specifically, the semantic role probability vector of each word segment can be separately calculated according to the following formula:
CrProbVec n=(CrProb n,1,CrProb n,2,......,CrProb n,l,......,CrProb n,L) CrProbVec n = (CrProb n,1 ,CrProb n,2 ,...,CrProb n,l ,...,CrProb n,L )
其中,CrProb n,l=ξ 1*FwCrProb n,l2*BkCrProb n,l,ξ 1、ξ 2均为预设的权重系数,CrProbVec n为第n个分词的语义角色概率向量。 Among them, CrProb n,l1 *FwCrProb n,l2 *BkCrProb n,l1 , ξ 2 are preset weight coefficients, and CrProbVec n is the semantic role probability vector of the nth participle.
根据下式分别确定各个分词的语义角色类型:Determine the semantic role type of each word segment according to the following formula:
RoleSeq n=argmax(CrProbVec n) RoleSeq n =argmax(CrProbVec n )
其中,argmax为最大自变量函数,RoleSeq n为第n个分词的语义角色类型序号。也即将第n个分词的语义角色概率向量中取值最大的元素所对应的语义角色类型确定为第n个分词的语义角色类型。也即将第n个分词的语义角色概率向量中取值最大的元素所对应的语义角色类型确定为第n个分词的语义角色类型。 Where argmax is the largest independent variable function and RoleSeq n is the semantic role type number of the nth participle. It is also determined that the semantic role type corresponding to the element with the largest value among the semantic role probability vectors of the nth participle is determined as the semantic role type of the nth participle. It is also determined that the semantic role type corresponding to the element with the largest value among the semantic role probability vectors of the nth participle is determined as the semantic role type of the nth participle.
综上所述,本申请实施例在两个最关键的处理过程中,都采用了两个神经网络模型进行处理,将原本较为复杂的神经网络模型拆分为相对简单的神经网络模型,再对各个神经网络模型的输出进行综合处理得到结果,由于神经网络模型结构的简化,大大减少了计算量,提升了分析效率。In summary, the two embodiments of the present application use two neural network models for processing in the two most critical processes, and the previously complex neural network model is divided into relatively simple neural network models, and then The output of each neural network model is processed comprehensively to obtain the result. Due to the simplification of the neural network model structure, the calculation amount is greatly reduced, and the analysis efficiency is improved.
对应于上文实施例所述的一种语义角色分析方法,图6示出了本申请实施例提供的一种语义角色分析装置的一个实施例结构图。Corresponding to a semantic role analysis method described in the foregoing embodiment, FIG. 6 is a structural diagram of an embodiment of a semantic role analysis apparatus provided by an embodiment of the present application.
本实施例中,一种语义角色分析装置可以包括:In this embodiment, a semantic role analysis apparatus may include:
切词处理模块601,用于对语句文本进行切词处理,得到构成所述语句文本的各个分词;The word processing module 601 is configured to perform word segmentation on the sentence text to obtain each word segment constituting the text of the sentence;
词向量查找模块602,用于在预设的词向量数据库中分别查找各个分词的词向量,所述词向量数据库为记录词语与词向量之间的对应关系的数据库;The word vector searching module 602 is configured to separately search for a word vector of each word segment in a preset word vector database, where the word vector database is a database for recording a correspondence between words and word vectors;
词向量矩阵构建模块603,用于根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵;a word vector matrix construction module 603, configured to respectively construct a first input matrix and a second input matrix of each word segment according to the word vector;
第一处理模块604,用于将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量,所述第一神经网络模型为进行正序词性分析的神经网络模型;The first processing module 604 is configured to input the first input matrix of each word segment into the preset first neural network model to obtain a first output vector of each word segment, where the first neural network model performs positive sequence word Analytical neural network model;
第二处理模块605,用于将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量,所述第二神经网络模型为进行逆序词性分析的神经网络模型;The second processing module 605 is configured to input the second input matrix of each participle into the preset second neural network model to obtain a second output vector of each participle, and the second neural network model is to perform reverse word analysis. Neural network model;
词性类型确定模块606,用于根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型;The part of speech type determining module 606 is configured to determine a part of speech type of each participle according to the first output vector and the second output vector of each participle;
词性向量查找模块607,用于在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库;The part of speech vector search module 607 is configured to search for a part of speech vector corresponding to the part of speech type of each participle in a preset part of speech vector database, where the part of speech vector database is a database for recording the correspondence between the part of speech type and the part of speech vector;
词性向量矩阵构建模块608,用于根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵;The part of speech vector matrix construction module 608 is configured to respectively construct a third input matrix and a fourth input matrix of each word segment according to the part of speech vector;
第三处理模块609,用于将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量,所述第三神经网络模型为进行正序语义角色分析的神经网络模型;The third processing module 609 is configured to input the third input matrix of each word segment into the preset third neural network model to obtain a third output vector of each word segment, and the third neural network model performs positive sequence semantics Neural network model for role analysis;
第四处理模块610,用于将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量,所述第四神经网络模型为进行逆序语义角色分析的神经网络模型;The fourth processing module 610 is configured to input the fourth input matrix of each participle into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is to perform a reverse order semantic role. Analytical neural network model;
语义角色类型确定模块611,用于根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。The semantic role type determining module 611 is configured to determine a semantic role type of each word segment according to the third output vector and the fourth output vector of each word segment.
所述语义角色分析装置的具体实施例与上述语义角色分析方法各实施例基本相同,可以参考前述方法实施例中的相关描述,在此不作赘述。The specific embodiment of the semantic role analysis device is substantially the same as the foregoing embodiments of the semantic role analysis method. Reference may be made to the related description in the foregoing method embodiments, and details are not described herein.
图7示出了本申请实施例提供的一种语义角色分析终端设备的示意框图,为了便于说明,仅示出了与本申请实施例相关的部分。FIG. 7 is a schematic block diagram of a semantic role analysis terminal device provided by an embodiment of the present application. For convenience of description, only parts related to the embodiment of the present application are shown.
在本实施例中,所述语义角色分析终端设备7可以是手机、平板电脑、桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该语义角色分析终端设备7可包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机可读指令72,例如执行上述的语义角色分析方法的计算机可读指令。所述处理器70执行所述计算机可读指令72时实现上述各个语义角色分析方法实施例中的步骤。In this embodiment, the semantic role analysis terminal device 7 may be a computing device such as a mobile phone, a tablet computer, a desktop computer, a notebook, a palmtop computer, and a cloud server. The semantic role analysis terminal device 7 may include a processor 70, a memory 71, and computer readable instructions 72 stored in the memory 71 and executable on the processor 70, such as performing the semantic role analysis method described above. Computer readable instructions. The processor 70 executes the steps in the embodiments of the various semantic role analysis methods described above when the computer readable instructions 72 are executed.
在本申请各个实施例中的各功能单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干计算机可读指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。The functional units in the various embodiments of the present application may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of computer readable instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.

Claims (20)

  1. 一种语义角色分析方法,其特征在于,包括:A semantic role analysis method, comprising:
    对语句文本进行切词处理,得到构成所述语句文本的各个分词;Performing word-cutting on the sentence text to obtain each participle constituting the text of the sentence;
    在预设的词向量数据库中分别查找各个分词的词向量,并根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵,所述词向量数据库为记录词语与词向量之间的对应关系的数据库;Searching for a word vector of each word segment in a preset word vector database, and respectively constructing a first input matrix and a second input matrix of each word segment according to the word vector, wherein the word vector database is between the record word and the word vector a database of correspondences;
    将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量,所述第一神经网络模型为进行正序词性分析的神经网络模型;Inputting a first input matrix of each participle into a preset first neural network model to obtain a first output vector of each participle, the first neural network model being a neural network model for performing positive-sequence part-of-speech analysis;
    将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量,所述第二神经网络模型为进行逆序词性分析的神经网络模型;The second input matrix of each participle is input into a preset second neural network model to obtain a second output vector of each participle, and the second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis;
    根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型;Determining the part of speech type of each participle according to the first output vector and the second output vector of each participle;
    在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,并根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵,所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库;Searching for a part-of-speech vector corresponding to the part-of-speech type of each participle in a preset part of speech vector database, and constructing a third input matrix and a fourth input matrix of each participle according to the part-of-speech vector, the part-of-speech vector database is a recorded part of speech type a database of correspondences with part of speech vectors;
    将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量,所述第三神经网络模型为进行正序语义角色分析的神经网络模型;The third input matrix of each participle is input into a preset third neural network model to obtain a third output vector of each participle, and the third neural network model is a neural network model for performing positive sequence semantic role analysis;
    将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量,所述第四神经网络模型为进行逆序语义角色分析的神经网络模型;The fourth input matrix of each participle is respectively input into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is a neural network model for performing reverse order semantic role analysis;
    根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。The semantic role type of each word segment is determined according to the third output vector and the fourth output vector of each participle.
  2. 根据权利要求1所述的语义角色分析方法,其特征在于,所述根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵包括:The semantic role analysis method according to claim 1, wherein the constructing the first input matrix and the second input matrix of each word segment according to the word vector respectively comprises:
    根据下式分别构建各个分词的第一输入矩阵:The first input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100001
    Figure PCTCN2018096258-appb-100001
    其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第一输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,wvl为所述第一输入矩阵的列号,1≤wvl≤wVecLen,wVecLen为任意一个所述分词的词向量的长度,第n个分词的词向量为WordVec n,且 Where n is the serial number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the first input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, wvl is the column number of the first input matrix, 1≤wvl≤wVecLen, wVecLen is the length of the word vector of any one of the participles, and the word vector of the nth participle is WordVec n , and
    WordVec n=(WdVecEm n,1,WdVecEm n,2,......,WdVecEm n,vl,......,WdVecEm n,wVecLen), WordVec n = (WdVecEm n,1 , WdVecEm n,2 ,...,WdVecEm n,vl ,...,WdVecEm n,wVecLen ),
    Figure PCTCN2018096258-appb-100002
    FwWdMatrix n为第n个分词的第一输入矩阵;
    Figure PCTCN2018096258-appb-100002
    FwWdMatrix n is the first input matrix of the nth participle;
    根据下式分别构建各个分词的第二输入矩阵:The second input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100003
    Figure PCTCN2018096258-appb-100003
    Figure PCTCN2018096258-appb-100004
    BkWdMatrix n为第n个分词的第二输入矩阵。
    Figure PCTCN2018096258-appb-100004
    BkWdMatrix n is the second input matrix of the nth participle.
  3. 根据权利要求2所述的语义角色分析方法,其特征在于,所述第一神经网络模型的处理过程包括:The semantic role analysis method according to claim 2, wherein the processing of the first neural network model comprises:
    根据下式分别计算各个分词的第一复合向量:Calculate the first composite vector of each participle according to the following formula:
    FwWdCpVec n=(FwWdCpEm n,1,FwWdCpEm n,2,......,FwWdCpEm n,wvl,......,FwWdCpEm n,wVecLen)其中, FwWdCpVec n = (FwWdCpEm n,1 , FwWdCpEm n,2 ,..., FwWdCpEm n,wvl ,...,FwWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100005
    Figure PCTCN2018096258-appb-100005
    ln为自然对数函数,tanh为双曲正切函数,FwWdWt wvl、FwWdWt′ wvl均为预设的权重系数; Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwWdWt wvl and FwWdWt' wvl are preset weight coefficients;
    根据下式分别计算各个词性类型的第一概率值:Calculate the first probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100006
    Figure PCTCN2018096258-appb-100006
    其中,m为词性类型的序号,1≤m≤M,M为词性类型的个数,FwWdWtVec m为预设的与第m个词性类型对应的权值向量,
    Figure PCTCN2018096258-appb-100007
    FwWdProb n,m为第n个分词是第m个词性类型的第一概率值;
    Where m is the number of the part of speech type, 1≤m≤M, M is the number of part of speech type, and FwWdWtVec m is the preset weight vector corresponding to the mth part of speech type.
    Figure PCTCN2018096258-appb-100007
    FwWdProb n,m is the first probability that the nth participle is the mth part of speech type;
    根据下式构建各个分词的第一输出向量:Construct the first output vector of each participle according to the following formula:
    FwWdVec n=(FwWdProb n,1,FwWdProb n,2,......,FwWdProb n,m,......,FwWdProb n,M) FwWdVec n = (FwWdProb n,1 , FwWdProb n,2 ,...,FwWdProb n,m ,...,FwWdProb n,M )
    其中,FwWdVec n为第n个分词的第一输出向量; Where FwWdVec n is the first output vector of the nth participle;
    所述第二神经网络模型的处理过程包括:The processing process of the second neural network model includes:
    根据下式分别计算各个分词的第二复合向量:Calculate the second composite vector of each participle according to the following formula:
    BkWdCpVec n=(BkWdCpEm n,1,BkWdCpEm n,2,......,BkWdCpEm n,wvl,......,BkWdCpEm n,wVecLen)其中, BkWdCpVec n = (BkWdCpEm n,1 , BkWdCpEm n,2 ,..., BkWdCpEm n,wvl ,..., BkWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100008
    Figure PCTCN2018096258-appb-100008
    BkWdWt wvl、BkWdWt′ wvl均为预设的权重系数; BkWdWt wvl and BkWdWt' wvl are preset weight coefficients;
    根据下式分别计算各个词性类型的第二概率值:Calculate the second probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100009
    Figure PCTCN2018096258-appb-100009
    其中,BkWdWtVec m为预设的与第m个词性类型对应的权值向量,BkWdProb n,m为第n个分词是第m个词性类型的第二概率值; Wherein, BkWdWtVec m is a preset weight vector corresponding to the mth part of speech type, and BkWdProb n,m is a second probability value that the nth participle is the mth part of speech type;
    根据下式构建各个分词的第二输出向量:Construct a second output vector for each participle according to the following formula:
    BkWdVec n=(BkWdProb n,1,BkWdProb n,2,......,BkWdProb n,m,......,BkWdProb n,M) BkWdVec n = (BkWdProb n,1 , BkWdProb n,2 ,..., BkWdProb n,m ,...,BkWdProb n,M )
    其中,BkWdVec n为第n个分词的第二输出向量。 Where BkWdVec n is the second output vector of the nth participle.
  4. 根据权利要求3所述的语义角色分析方法,其特征在于,所述根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型包括:The semantic role analysis method according to claim 3, wherein the determining the part of speech type of each word segment according to the first output vector and the second output vector of each word segment comprises:
    根据下式分别计算各个分词的词性概率向量:Calculate the part-of-speech probability vector of each participle according to the following formula:
    WdProbVec n=(WdProb n,1,WdProb n,2,......,WdProb n,m,......,WdProb n,M) WdProbVec n = (WdProb n,1 , WdProb n,2 ,...,WdProb n,m ,...,WdProb n,M )
    其中,WdProb n,m=η 1*FwWdProb n,m2*BkWdProb n,m,η 1、η 2均为预设的权重系数, WdProbVec n为第n个分词的词性概率向量; Wherein, WdProb n, m = η 1 * FwWdProb n, m + η 2 * BkWdProb n, m , η 1 , η 2 are preset weight coefficients, and WdProbVec n is a part-of-speech probability vector of the nth participle;
    根据下式分别确定各个分词的词性类型:Determine the part of speech of each participle according to the following formula:
    CharSeq n=arg max(WdProbVec n) CharSeq n =arg max(WdProbVec n )
    其中,arg max为最大自变量函数,CharSeq n为第n个分词的词性类型序号。 Where arg max is the largest independent variable function and CharSeq n is the part-of-speech type number of the nth participle.
  5. 根据权利要求1所述的语义角色分析方法,其特征在于,所述根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵包括:The semantic role analysis method according to claim 1, wherein the constructing the third input matrix and the fourth input matrix of each word segment according to the part of speech vector respectively comprises:
    根据下式分别构建各个分词的第三输入矩阵:The third input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100010
    Figure PCTCN2018096258-appb-100010
    其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第三输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,cvl为所述第三输入矩阵的列号,1≤cvl≤cVecLen,cVecLen为任意一个所述分词的词性向量的长度,第n个分词的词性向量为CharVec n,且 Where n is the sequence number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the third input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, cvl is the column number of the third input matrix, 1≤cvl≤cVecLen, cVecLen is the length of the part of speech vector of any one of the participles, and the part of speech vector of the nth participle is CharVec n , and
    CharVec n=(CrVecEm n,1,CrVecEm n,2,......,CrVecEm n,cvl,......,CrVecEm n,cVecLen), CharVec n = (CrVecEm n,1 ,CrVecEm n,2 ,...,CrVecEm n,cvl ,...,CrVecEm n,cVecLen ),
    Figure PCTCN2018096258-appb-100011
    FwCrMatrix n为第n个分词的第三输入矩阵;
    Figure PCTCN2018096258-appb-100011
    FwCrMatrix n is the third input matrix of the nth participle;
    根据下式分别构建各个分词的第四输入矩阵:The fourth input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100012
    Figure PCTCN2018096258-appb-100012
    Figure PCTCN2018096258-appb-100013
    BkCrMatrix n为第n个分词的第四输入矩阵。
    Figure PCTCN2018096258-appb-100013
    BkCrMatrix n is the fourth input matrix of the nth participle.
  6. 根据权利要求5所述的语义角色分析方法,其特征在于,所述第三神经网络模型的处理过程包括:The semantic role analysis method according to claim 5, wherein the processing of the third neural network model comprises:
    根据下式分别计算各个分词的第三复合向量:Calculate the third composite vector of each participle according to the following formula:
    FwCrCpVec n=(FwCrCpEm n,1,FwCrCpEm n,2,......,FwCrCpEm n,vl,......,FwCrCpEm n,cVecLen)其中, FwCrCpVec n = (FwCrCpEm n,1 , FwCrCpEm n,2 , . . . , FwCrCpEm n, vl , . . . , FwCrCpEm n, cVecLen )
    Figure PCTCN2018096258-appb-100014
    Figure PCTCN2018096258-appb-100014
    ln为自然对数函数,tanh为双曲正切函数,FwCrWt cvl、FwCrWt′ cvl均为预设的权重系数; Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwCrWt cvl and FwCrWt' cvl are preset weight coefficients;
    根据下式分别计算各个语义角色类型的第一概率值:Calculate the first probability value of each semantic role type according to the following formula:
    Figure PCTCN2018096258-appb-100015
    Figure PCTCN2018096258-appb-100015
    其中,l为语义角色类型的序号,1≤l≤L,L为语义角色类型的个数,FwCrWtVec l为预设的与第l个语义角色类型对应的权值向量,
    Figure PCTCN2018096258-appb-100016
    FwCrProb n,l为第n个分词是第l个语义角色类型的第一概率值;
    Where l is the sequence number of the semantic role type, 1≤l≤L, L is the number of semantic role types, and FwCrWtVec l is the preset weight vector corresponding to the first semantic role type.
    Figure PCTCN2018096258-appb-100016
    FwCrProb n,l is the first probability that the nth participle is the first semantic role type;
    根据下式构建各个分词的第三输出向量:Construct a third output vector for each participle according to the following formula:
    FwCrVec n=(FwCrProb n,1,FwCrProb n,2,......,FwCrProb n,l,......,FwCrProb n,L) FwCrVec n = (FwCrProb n,1 , FwCrProb n,2 ,...,FwCrProb n,l ,...,FwCrProb n,L )
    其中,FwCrVec n为第n个分词的第三输出向量; Where FwCrVec n is the third output vector of the nth participle;
    所述第四神经网络模型的处理过程包括:The processing process of the fourth neural network model includes:
    根据下式分别计算各个分词的第四复合向量:Calculate the fourth composite vector of each participle according to the following formula:
    BkCrCpVec n=(BkCrCpEm n,1,BkCrCpEm n,2,......,BkCrCpEm n,cvl,......,BkCrCpEm n,cVecLen)其中, BkCrCpVec n = (BkCrCpEm n,1 , BkCrCpEm n,2 , . . . , BkCrCpEm n, cvl , . . . , BkCrCpEm n, cVecLen ) wherein
    Figure PCTCN2018096258-appb-100017
    Figure PCTCN2018096258-appb-100017
    BkCrWt cvl、BkCrWt′ cvl均为预设的权重系数; BkCrWt cvl and BkCrWt' cvl are preset weight coefficients;
    根据下式分别计算各个语义角色类型的第二概率值:Calculate the second probability value of each semantic role type according to the following formula:
    Figure PCTCN2018096258-appb-100018
    Figure PCTCN2018096258-appb-100018
    其中,BkCrWtVec l为预设的与第l个语义角色类型对应的权值向量,BkCrProb n,l为第n个分词是第l个语义角色类型的第二概率值; Where BkCrWtVec l is a preset weight vector corresponding to the first semantic role type, and BkCrProb n, l is the second probability that the nth participle is the first semantic role type;
    根据下式构建各个分词的第四输出向量:The fourth output vector of each participle is constructed according to the following formula:
    BkCrVec n=(BkCrProb n,1,BkCrProb n,2,......,BkCrProb n,l,......,BkCrProb n,L) BkCrVec n = (BkCrProb n,1 , BkCrProb n,2 ,...,BkCrProb n,l ,...,BkCrProb n,L )
    其中,BkCrVec n为第n个分词的第四输出向量。 Where BkCrVec n is the fourth output vector of the nth participle.
  7. 根据权利要求6所述的语义角色分析方法,其特征在于,所述根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型包括:The semantic role analysis method according to claim 6, wherein the determining the semantic role types of each word segment according to the third output vector and the fourth output vector of each word segment includes:
    根据下式分别计算各个分词的语义角色概率向量:Calculate the semantic role probability vector of each participle according to the following formula:
    CrProbVec n=(CrProb n,1,CrProb n,2,......,CrProb n,l,......,CrProb n,L) CrProbVec n = (CrProb n,1 ,CrProb n,2 ,...,CrProb n,l ,...,CrProb n,L )
    其中,CrProb n,l=ξ 1*FwCrProb n,l2*BkCrProb n,l,ξ 1、ξ 2均为预设的权重系数,CrProbVec n为第n个分词的语义角色概率向量; Wherein, CrProb n,l1 *FwCrProb n,l2 *BkCrProb n,l1 , ξ 2 are preset weight coefficients, and CrProbVec n is the semantic role probability vector of the nth participle;
    根据下式分别确定各个分词的语义角色类型:Determine the semantic role type of each word segment according to the following formula:
    RoleSeq n=arg max(CrProbVec n) RoleSeq n =arg max(CrProbVec n )
    其中,arg max为最大自变量函数,RoleSeq n为第n个分词的语义角色类型序号。 Where arg max is the largest independent variable function and RoleSeq n is the semantic role type number of the nth participle.
  8. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:A computer readable storage medium storing computer readable instructions, wherein the computer readable instructions, when executed by a processor, implement the following steps:
    对语句文本进行切词处理,得到构成所述语句文本的各个分词;Performing word-cutting on the sentence text to obtain each participle constituting the text of the sentence;
    在预设的词向量数据库中分别查找各个分词的词向量,并根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵,所述词向量数据库为记录词语与词向量之间的对应关系的数据库;Searching for a word vector of each word segment in a preset word vector database, and respectively constructing a first input matrix and a second input matrix of each word segment according to the word vector, wherein the word vector database is between the record word and the word vector a database of correspondences;
    将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量,所述第一神经网络模型为进行正序词性分析的神经网络模型;Inputting a first input matrix of each participle into a preset first neural network model to obtain a first output vector of each participle, the first neural network model being a neural network model for performing positive-sequence part-of-speech analysis;
    将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量,所述第二神经网络模型为进行逆序词性分析的神经网络模型;The second input matrix of each participle is input into a preset second neural network model to obtain a second output vector of each participle, and the second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis;
    根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型;Determining the part of speech type of each participle according to the first output vector and the second output vector of each participle;
    在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,并根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵,所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库;Searching for a part-of-speech vector corresponding to the part-of-speech type of each participle in a preset part of speech vector database, and constructing a third input matrix and a fourth input matrix of each participle according to the part-of-speech vector, the part-of-speech vector database is a recorded part of speech type a database of correspondences with part of speech vectors;
    将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量,所述第三神经网络模型为进行正序语义角色分析的神经网络模型;The third input matrix of each participle is input into a preset third neural network model to obtain a third output vector of each participle, and the third neural network model is a neural network model for performing positive sequence semantic role analysis;
    将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量,所述第四神经网络模型为进行逆序语义角色分析的神经网络模型;The fourth input matrix of each participle is respectively input into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is a neural network model for performing reverse order semantic role analysis;
    根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。The semantic role type of each word segment is determined according to the third output vector and the fourth output vector of each participle.
  9. 根据权利要求8所述的计算机可读存储介质,其特征在于,所述根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵包括:The computer readable storage medium according to claim 8, wherein the constructing the first input matrix and the second input matrix of each word segment according to the word vector respectively comprises:
    根据下式分别构建各个分词的第一输入矩阵:The first input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100019
    Figure PCTCN2018096258-appb-100019
    其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第一输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,wvl为所述第一输入矩阵的列号,1≤wvl≤wVecLen,wVecLen为任意一个所述分词的词向量的长度,第n个分词的词向量为WordVec n,且 Where n is the serial number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the first input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, wvl is the column number of the first input matrix, 1≤wvl≤wVecLen, wVecLen is the length of the word vector of any one of the participles, and the word vector of the nth participle is WordVec n , and
    WordVec n=(WdVecEm n,1,WdVecEm n,2,......,WdVecEm n,vl,......,WdVecEm n,wVecLen), WordVec n = (WdVecEm n,1 , WdVecEm n,2 ,...,WdVecEm n,vl ,...,WdVecEm n,wVecLen ),
    Figure PCTCN2018096258-appb-100020
    FwWdMatrix n为第n个分词的第一输入矩阵;
    Figure PCTCN2018096258-appb-100020
    FwWdMatrix n is the first input matrix of the nth participle;
    根据下式分别构建各个分词的第二输入矩阵:The second input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100021
    Figure PCTCN2018096258-appb-100021
    Figure PCTCN2018096258-appb-100022
    BkWdMatrix n为第n个分词的第二输入矩阵。
    Figure PCTCN2018096258-appb-100022
    BkWdMatrix n is the second input matrix of the nth participle.
  10. 根据权利要求9所述的计算机可读存储介质,其特征在于,所述第一神经网络模型的处理过程包括:The computer readable storage medium according to claim 9, wherein the processing of the first neural network model comprises:
    根据下式分别计算各个分词的第一复合向量:Calculate the first composite vector of each participle according to the following formula:
    FwWdCpVec n=(FwWdCpEm n,1,FwWdCpEm n,2,......,FwWdCpEm n,wvl,......,FwWdCpEm n,wVecLen)其中, FwWdCpVec n = (FwWdCpEm n,1 , FwWdCpEm n,2 ,..., FwWdCpEm n,wvl ,...,FwWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100023
    Figure PCTCN2018096258-appb-100023
    ln为自然对数函数,tanh为双曲正切函数,FwWdWt wvl、FwWdWt′ wvl均为预设的权重系数; Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwWdWt wvl and FwWdWt' wvl are preset weight coefficients;
    根据下式分别计算各个词性类型的第一概率值:Calculate the first probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100024
    Figure PCTCN2018096258-appb-100024
    其中,m为词性类型的序号,1≤m≤M,M为词性类型的个数,FwWdWtVec m为预设的与第m个词性类型对应的权值向量,
    Figure PCTCN2018096258-appb-100025
    FwWdProb n,m为第n个分词是第m个词性类型的第一概率值;
    Where m is the number of the part of speech type, 1≤m≤M, M is the number of part of speech type, and FwWdWtVec m is the preset weight vector corresponding to the mth part of speech type.
    Figure PCTCN2018096258-appb-100025
    FwWdProb n,m is the first probability that the nth participle is the mth part of speech type;
    根据下式构建各个分词的第一输出向量:Construct the first output vector of each participle according to the following formula:
    FwWdVec n=(FwWdProb n,1,FwWdProb n,2,......,FwWdProb n,m,......,FwWdProb n,M) FwWdVec n = (FwWdProb n,1 , FwWdProb n,2 ,...,FwWdProb n,m ,...,FwWdProb n,M )
    其中,FwWdVec n为第n个分词的第一输出向量; Where FwWdVec n is the first output vector of the nth participle;
    所述第二神经网络模型的处理过程包括:The processing process of the second neural network model includes:
    根据下式分别计算各个分词的第二复合向量:Calculate the second composite vector of each participle according to the following formula:
    BkWdCpVec n=(BkWdCpEm n,1,BkWdCpEm n,2,......,BkWdCpEm n,wvl,......,BkWdCpEm n,wVecLen)其中, BkWdCpVec n = (BkWdCpEm n,1 , BkWdCpEm n,2 ,..., BkWdCpEm n,wvl ,..., BkWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100026
    Figure PCTCN2018096258-appb-100026
    BkWdWt wvl、BkWdWt′ wvl均为预设的权重系数; BkWdWt wvl and BkWdWt' wvl are preset weight coefficients;
    根据下式分别计算各个词性类型的第二概率值:Calculate the second probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100027
    Figure PCTCN2018096258-appb-100027
    其中,BkWdWtVec m为预设的与第m个词性类型对应的权值向量,BkWdProb n,m为第n个分词是第m个词性类型的第二概率值; Wherein, BkWdWtVec m is a preset weight vector corresponding to the mth part of speech type, and BkWdProb n,m is a second probability value that the nth participle is the mth part of speech type;
    根据下式构建各个分词的第二输出向量:Construct a second output vector for each participle according to the following formula:
    BkWdVec n=(BkWdProb n,1,BkWdProb n,2,......,BkWdProb n,m,......,BkWdProb n,M) BkWdVec n = (BkWdProb n,1 , BkWdProb n,2 ,..., BkWdProb n,m ,...,BkWdProb n,M )
    其中,BkWdVec n为第n个分词的第二输出向量。 Where BkWdVec n is the second output vector of the nth participle.
  11. 一种语义角色分析终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A semantic role analysis terminal device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor executes the computer readable instructions The following steps are implemented:
    对语句文本进行切词处理,得到构成所述语句文本的各个分词;Performing word-cutting on the sentence text to obtain each participle constituting the text of the sentence;
    在预设的词向量数据库中分别查找各个分词的词向量,并根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵,所述词向量数据库为记录词语与词向量之间的对应关系的数据库;Searching for a word vector of each word segment in a preset word vector database, and respectively constructing a first input matrix and a second input matrix of each word segment according to the word vector, wherein the word vector database is between the record word and the word vector a database of correspondences;
    将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量,所述第一神经网络模型为进行正序词性分析的神经网络模型;Inputting a first input matrix of each participle into a preset first neural network model to obtain a first output vector of each participle, the first neural network model being a neural network model for performing positive-sequence part-of-speech analysis;
    将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量,所述第二神经网络模型为进行逆序词性分析的神经网络模型;The second input matrix of each participle is input into a preset second neural network model to obtain a second output vector of each participle, and the second neural network model is a neural network model for performing reverse-sequence part-of-speech analysis;
    根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型;Determining the part of speech type of each participle according to the first output vector and the second output vector of each participle;
    在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,并根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵,所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库;Searching for a part-of-speech vector corresponding to the part-of-speech type of each participle in a preset part of speech vector database, and constructing a third input matrix and a fourth input matrix of each participle according to the part-of-speech vector, the part-of-speech vector database is a recorded part of speech type a database of correspondences with part of speech vectors;
    将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量,所述第三神经网络模型为进行正序语义角色分析的神经网络模型;The third input matrix of each participle is input into a preset third neural network model to obtain a third output vector of each participle, and the third neural network model is a neural network model for performing positive sequence semantic role analysis;
    将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量,所述第四神经网络模型为进行逆序语义角色分析的神经网络模型;The fourth input matrix of each participle is respectively input into a preset fourth neural network model to obtain a fourth output vector of each participle, and the fourth neural network model is a neural network model for performing reverse order semantic role analysis;
    根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。The semantic role type of each word segment is determined according to the third output vector and the fourth output vector of each participle.
  12. 根据权利要求11所述的语义角色分析终端设备,其特征在于,所述根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵包括:The semantic role analysis terminal device according to claim 11, wherein the first input matrix and the second input matrix for respectively constructing each word segment according to the word vector include:
    根据下式分别构建各个分词的第一输入矩阵:The first input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100028
    Figure PCTCN2018096258-appb-100028
    其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第一输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,wvl为所述第一输入矩阵的列号,1≤wvl≤wVecLen,wVecLen为任意一个所述分词的词向量的长度,第n个分词的词向量为WordVec n,且 Where n is the serial number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the first input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, wvl is the column number of the first input matrix, 1≤wvl≤wVecLen, wVecLen is the length of the word vector of any one of the participles, and the word vector of the nth participle is WordVec n , and
    WordVec n=(WdVecEm n,1,WdVecEm n,2,......,WdVecEm n,vl,......,WdVecEm n,wVecLen), WordVec n = (WdVecEm n,1 , WdVecEm n,2 ,...,WdVecEm n,vl ,...,WdVecEm n,wVecLen ),
    Figure PCTCN2018096258-appb-100029
    FwWdMatrix n为第n个分词的第一输入矩阵;
    Figure PCTCN2018096258-appb-100029
    FwWdMatrix n is the first input matrix of the nth participle;
    根据下式分别构建各个分词的第二输入矩阵:The second input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100030
    Figure PCTCN2018096258-appb-100030
    Figure PCTCN2018096258-appb-100031
    BkWdMatrix n为第n个分词的第二输入矩阵。
    Figure PCTCN2018096258-appb-100031
    BkWdMatrix n is the second input matrix of the nth participle.
  13. 根据权利要求12所述的语义角色分析终端设备,其特征在于,所述第一神经网络模型的处理过程包括:The semantic role analysis terminal device according to claim 12, wherein the processing process of the first neural network model comprises:
    根据下式分别计算各个分词的第一复合向量:Calculate the first composite vector of each participle according to the following formula:
    FwWdCpVec n=(FwWdCpEm n,1,FwWdCpEm n,2,......,FwWdCpEm n,wvl,......,FwWdCpEm n,wVecLen)其中, FwWdCpVec n = (FwWdCpEm n,1 , FwWdCpEm n,2 ,..., FwWdCpEm n,wvl ,...,FwWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100032
    Figure PCTCN2018096258-appb-100032
    ln为自然对数函数,tanh为双曲正切函数,FwWdWt wvl、FwWdWt′ wvl均为预设的权重系数; Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwWdWt wvl and FwWdWt' wvl are preset weight coefficients;
    根据下式分别计算各个词性类型的第一概率值:Calculate the first probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100033
    Figure PCTCN2018096258-appb-100033
    其中,m为词性类型的序号,1≤m≤M,M为词性类型的个数,FwWdWtVec m为预设的与第m个词性类型对应的权值向量,
    Figure PCTCN2018096258-appb-100034
    FwWdProb n,m为第n个分词是第m个词性类型的第一概率值;
    Where m is the number of the part of speech type, 1≤m≤M, M is the number of part of speech type, and FwWdWtVec m is the preset weight vector corresponding to the mth part of speech type.
    Figure PCTCN2018096258-appb-100034
    FwWdProb n,m is the first probability that the nth participle is the mth part of speech type;
    根据下式构建各个分词的第一输出向量:Construct the first output vector of each participle according to the following formula:
    FwWdVec n=(FwWdProb n,1,FwWdProb n,2,......,FwWdProb n,m,......,FwWdProb n,M) FwWdVec n = (FwWdProb n,1 , FwWdProb n,2 ,...,FwWdProb n,m ,...,FwWdProb n,M )
    其中,FwWdVec n为第n个分词的第一输出向量; Where FwWdVec n is the first output vector of the nth participle;
    所述第二神经网络模型的处理过程包括:The processing process of the second neural network model includes:
    根据下式分别计算各个分词的第二复合向量:Calculate the second composite vector of each participle according to the following formula:
    BkWdCpVec n=(BkWdCpEm n,1,BkWdCpEm n,2,......,BkWdCpEm n,wvl,......,BkWdCpEm n,wVecLen)其中, BkWdCpVec n = (BkWdCpEm n,1 , BkWdCpEm n,2 ,..., BkWdCpEm n,wvl ,..., BkWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100035
    Figure PCTCN2018096258-appb-100035
    BkWdWt wvl、BkWdWt′ wvl均为预设的权重系数; BkWdWt wvl and BkWdWt' wvl are preset weight coefficients;
    根据下式分别计算各个词性类型的第二概率值:Calculate the second probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100036
    Figure PCTCN2018096258-appb-100036
    其中,BkWdWtVec m为预设的与第m个词性类型对应的权值向量,BkWdProb n,m为第n个分词是第m个词性类型的第二概率值; Wherein, BkWdWtVec m is a preset weight vector corresponding to the mth part of speech type, and BkWdProb n,m is a second probability value that the nth participle is the mth part of speech type;
    根据下式构建各个分词的第二输出向量:Construct a second output vector for each participle according to the following formula:
    BkWdVec n=(BkWdProb n,1,BkWdProb n,2,......,BkWdProb n,m,......,BkWdProb n,M) BkWdVec n = (BkWdProb n,1 , BkWdProb n,2 ,..., BkWdProb n,m ,...,BkWdProb n,M )
    其中,BkWdVec n为第n个分词的第二输出向量。 Where BkWdVec n is the second output vector of the nth participle.
  14. 根据权利要求13所述的语义角色分析终端设备,其特征在于,所述根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型包括:The semantic role analysis terminal device according to claim 13, wherein the determining the part of speech type of each word segment according to the first output vector and the second output vector of each word segment comprises:
    根据下式分别计算各个分词的词性概率向量:Calculate the part-of-speech probability vector of each participle according to the following formula:
    WdProbVec n=(WdProb n,1,WdProb n,2,......,WdProb n,m,......,WdProb n,M) WdProbVec n = (WdProb n,1 , WdProb n,2 ,...,WdProb n,m ,...,WdProb n,M )
    其中,WdProb n,m=η 1*FwWdProb n,m2*BkWdProb n,m,η 1、η 2均为预设的权重系数,WdProbVec n为第n个分词的词性概率向量; Wherein, WdProb n, m = η 1 * FwWdProb n, m + η 2 * BkWdProb n, m , η 1 , η 2 are preset weight coefficients, and WdProbVec n is the part-of-speech probability vector of the nth participle;
    根据下式分别确定各个分词的词性类型:Determine the part of speech of each participle according to the following formula:
    CharSeq n=arg max(WdProbVec n) CharSeq n =arg max(WdProbVec n )
    其中,arg max为最大自变量函数,CharSeq n为第n个分词的词性类型序号。 Where arg max is the largest independent variable function and CharSeq n is the part-of-speech type number of the nth participle.
  15. 根据权利要求11所述的语义角色分析终端设备,其特征在于,所述根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵包括:The semantic role analysis terminal device according to claim 11, wherein the third input matrix and the fourth input matrix for respectively constructing each word segment according to the part of speech vector include:
    根据下式分别构建各个分词的第三输入矩阵:The third input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100037
    Figure PCTCN2018096258-appb-100037
    其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第三输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,cvl为所述第三输入矩阵的列号,1≤cvl≤cVecLen,cVecLen为任意一个所述分词的词性向量的长度,第n个分词的词性向量为CharVec n,且 Where n is the sequence number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the third input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, cvl is the column number of the third input matrix, 1≤cvl≤cVecLen, cVecLen is the length of the part of speech vector of any one of the participles, and the part of speech vector of the nth participle is CharVec n , and
    CharVec n=(CrVecEm n,1,CrVecEm n,2,......,CrVecEm n,cvl,......,CrVecEm n,cVecLen), CharVec n = (CrVecEm n,1 ,CrVecEm n,2 ,...,CrVecEm n,cvl ,...,CrVecEm n,cVecLen ),
    Figure PCTCN2018096258-appb-100038
    FwCrMatrix n为第n个分词的第三输入矩阵;
    Figure PCTCN2018096258-appb-100038
    FwCrMatrix n is the third input matrix of the nth participle;
    根据下式分别构建各个分词的第四输入矩阵:The fourth input matrix of each participle is constructed according to the following formula:
    Figure PCTCN2018096258-appb-100039
    Figure PCTCN2018096258-appb-100039
    Figure PCTCN2018096258-appb-100040
    BkCrMatrix n为第n个分词的第四输入矩阵。
    Figure PCTCN2018096258-appb-100040
    BkCrMatrix n is the fourth input matrix of the nth participle.
  16. 根据权利要求15所述的语义角色分析终端设备,其特征在于,所述第三神经网络模型的处理过程包括:The semantic role analysis terminal device according to claim 15, wherein the processing process of the third neural network model comprises:
    根据下式分别计算各个分词的第三复合向量:Calculate the third composite vector of each participle according to the following formula:
    FwCrCpVec n=(FwCrCpEm n,1,FwCrCpEm n,2,......,FwCrCpEm n,vl,......,FwCrCpEm n,cVecLen)其中, FwCrCpVec n = (FwCrCpEm n,1 , FwCrCpEm n,2 , . . . , FwCrCpEm n, vl , . . . , FwCrCpEm n, cVecLen )
    Figure PCTCN2018096258-appb-100041
    Figure PCTCN2018096258-appb-100041
    ln为自然对数函数,tanh为双曲正切函数,FwCrWt cvl、FwCrWt′ cvl均为预设的权重系数; Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwCrWt cvl and FwCrWt' cvl are preset weight coefficients;
    根据下式分别计算各个语义角色类型的第一概率值:Calculate the first probability value of each semantic role type according to the following formula:
    Figure PCTCN2018096258-appb-100042
    Figure PCTCN2018096258-appb-100042
    其中,l为语义角色类型的序号,1≤l≤L,L为语义角色类型的个数,FwCrWtVec l为预设的与第l个语义角色类型对应的权值向量,
    Figure PCTCN2018096258-appb-100043
    FwCrProb n,l为第n个分词是第l个语义角色类型的第一概率值;
    Where l is the sequence number of the semantic role type, 1≤l≤L, L is the number of semantic role types, and FwCrWtVec l is the preset weight vector corresponding to the first semantic role type.
    Figure PCTCN2018096258-appb-100043
    FwCrProb n,l is the first probability that the nth participle is the first semantic role type;
    根据下式构建各个分词的第三输出向量:Construct a third output vector for each participle according to the following formula:
    FwCrVec n=(FwCrProb n,1,FwCrProb n,2,......,FwCrProb n,l,......,FwCrProb n,L) FwCrVec n = (FwCrProb n,1 , FwCrProb n,2 ,...,FwCrProb n,l ,...,FwCrProb n,L )
    其中,FwCrVec n为第n个分词的第三输出向量; Where FwCrVec n is the third output vector of the nth participle;
    所述第四神经网络模型的处理过程包括:The processing process of the fourth neural network model includes:
    根据下式分别计算各个分词的第四复合向量:Calculate the fourth composite vector of each participle according to the following formula:
    BkCrCpVec n=(BkCrCpEm n,1,BkCrCpEm n,2,......,BkCrCpEm n,cvl,......,BkCrCpEm n,cVecLen)其中, BkCrCpVec n = (BkCrCpEm n,1 , BkCrCpEm n,2 , . . . , BkCrCpEm n, cvl , . . . , BkCrCpEm n, cVecLen ) wherein
    Figure PCTCN2018096258-appb-100044
    Figure PCTCN2018096258-appb-100044
    BkCrWt cvl、BkCrWt′ cvl均为预设的权重系数; BkCrWt cvl and BkCrWt' cvl are preset weight coefficients;
    根据下式分别计算各个语义角色类型的第二概率值:Calculate the second probability value of each semantic role type according to the following formula:
    Figure PCTCN2018096258-appb-100045
    Figure PCTCN2018096258-appb-100045
    其中,BkCrWtVec l为预设的与第l个语义角色类型对应的权值向量,BkCrProb n,l为第n个分词是第l个语义角色类型的第二概率值; Where BkCrWtVec l is a preset weight vector corresponding to the first semantic role type, and BkCrProb n, l is the second probability that the nth participle is the first semantic role type;
    根据下式构建各个分词的第四输出向量:The fourth output vector of each participle is constructed according to the following formula:
    BkCrVec n=(BkCrProb n,1,BkCrProb n,2,......,BkCrProb n,l,......,BkCrProb n,L) BkCrVec n = (BkCrProb n,1 , BkCrProb n,2 ,...,BkCrProb n,l ,...,BkCrProb n,L )
    其中,BkCrVec n为第n个分词的第四输出向量。 Where BkCrVec n is the fourth output vector of the nth participle.
  17. 根据权利要求16所述的语义角色分析终端设备,其特征在于,所述根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型包括:The semantic role analysis terminal device according to claim 16, wherein the determining the semantic role type of each word segment according to the third output vector and the fourth output vector of each word segment includes:
    根据下式分别计算各个分词的语义角色概率向量:Calculate the semantic role probability vector of each participle according to the following formula:
    CrProbVec n=(CrProb n,1,CrProb n,2,......,CrProb n,l,......,CrProb n,L) CrProbVec n = (CrProb n,1 ,CrProb n,2 ,...,CrProb n,l ,...,CrProb n,L )
    其中,CrProb n,l=ξ 1*FwCrProb n,l2*BkCrProb n,l,ξ 1、ξ 2均为预设的权重系数,CrProbVec n为第n个分词的语义角色概率向量; Wherein, CrProb n,l1 *FwCrProb n,l2 *BkCrProb n,l1 , ξ 2 are preset weight coefficients, and CrProbVec n is the semantic role probability vector of the nth participle;
    根据下式分别确定各个分词的语义角色类型:Determine the semantic role type of each word segment according to the following formula:
    RoleSeq n=arg max(CrProbVec n) RoleSeq n =arg max(CrProbVec n )
    其中,arg max为最大自变量函数,RoleSeq n为第n个分词的语义角色类型序号。 Where arg max is the largest independent variable function and RoleSeq n is the semantic role type number of the nth participle.
  18. 一种语义角色分析装置,其特征在于,包括:A semantic role analysis device, comprising:
    切词处理模块,用于对语句文本进行切词处理,得到构成所述语句文本的各个分词;a word processing module for performing word segmentation on the sentence text to obtain respective word segments constituting the text of the sentence;
    词向量查找模块,用于在预设的词向量数据库中分别查找各个分词的词向量,所述词向量数据库为记录词语与词向量之间的对应关系的数据库;a word vector search module, configured to separately search for a word vector of each participle in a preset word vector database, where the word vector database is a database for recording a correspondence between a word and a word vector;
    词向量矩阵构建模块,用于根据所述词向量分别构建各个分词的第一输入矩阵和第二输入矩阵;a word vector matrix building module, configured to respectively construct a first input matrix and a second input matrix of each word segment according to the word vector;
    第一处理模块,用于将各个分词的第一输入矩阵分别输入到预设的第一神经网络模型中,得到各个分词的第一输出向量,所述第一神经网络模型为进行正序词性分析的神经网络模型;a first processing module, configured to input a first input matrix of each word segment into a preset first neural network model, to obtain a first output vector of each word segment, where the first neural network model performs positive sequence word analysis Neural network model;
    第二处理模块,用于将各个分词的第二输入矩阵分别输入到预设的第二神经网络模型中,得到各个分词的第二输出向量,所述第二神经网络模型为进行逆序词性分析的神经网络模型;a second processing module, configured to input a second input matrix of each participle into a preset second neural network model to obtain a second output vector of each participle, where the second neural network model is for performing reverse word analysis Neural network model;
    词性类型确定模块,用于根据各个分词的第一输出向量和第二输出向量确定各个分词的词性类型;a part of speech type determining module, configured to determine a part of speech type of each participle according to a first output vector and a second output vector of each participle;
    词性向量查找模块,用于在预设的词性向量数据库中分别查找各个分词的词性类型对应的词性向量,所述词性向量数据库为记录词性类型与词性向量之间的对应关系的数据库;a part of speech vector search module, configured to respectively search for a part of speech vector corresponding to a part of speech type of each participle in a preset part of speech vector database, wherein the part of speech vector database is a database for recording a correspondence between a part of speech type and a part of speech vector;
    词性向量矩阵构建模块,用于根据所述词性向量分别构建各个分词的第三输入矩阵和第四输入矩阵;a part of speech vector matrix building module, configured to respectively construct a third input matrix and a fourth input matrix of each word segment according to the part of speech vector;
    第三处理模块,用于将各个分词的第三输入矩阵分别输入到预设的第三神经网络模型中,得到各个分词的第三输出向量,所述第三神经网络模型为进行正序语义角色分析的神经网络模型;a third processing module, configured to input a third input matrix of each word segment into a preset third neural network model to obtain a third output vector of each word segment, where the third neural network model performs a positive sequence semantic role Analytical neural network model;
    第四处理模块,用于将各个分词的第四输入矩阵分别输入到预设的第四神经网络模型中,得到各个分词的第四输出向量,所述第四神经网络模型为进行逆序语义角色分析的神经网络模型;a fourth processing module, configured to input a fourth input matrix of each participle into a preset fourth neural network model, to obtain a fourth output vector of each participle, and the fourth neural network model is to perform inverse semantic role analysis Neural network model;
    语义角色类型确定模块,用于根据各个分词的第三输出向量和第四输出向量确定各个分词的语义角色类型。The semantic role type determining module is configured to determine a semantic role type of each word segment according to the third output vector and the fourth output vector of each participle.
  19. 根据权利要求18所述的语义角色分析装置,其特征在于,所述词向量矩阵构建模块包括:The semantic role analysis apparatus according to claim 18, wherein the word vector matrix construction module comprises:
    第一输入矩阵构建单元,用于根据下式分别构建各个分词的第一输入矩阵:a first input matrix construction unit for constructing a first input matrix of each word segment according to the following formula:
    Figure PCTCN2018096258-appb-100046
    Figure PCTCN2018096258-appb-100046
    其中,n为所述分词按照前后顺序依次排列的序号,1≤n≤N,N为所述语句文本的分词总数,cl为所述第一输入矩阵的行号,1≤cl≤CoupLen,CoupLen为预设的耦合长度,wvl为所述第一输入矩阵的列号,1≤wvl≤wVecLen,wVecLen为任意一个所述分词的词向量的长度,第n个分词的词向量为WordVec n,且 Where n is the serial number of the word segmentation in order of precedence, 1≤n≤N, N is the total number of word segmentation of the sentence text, cl is the line number of the first input matrix, 1≤cl≤CoupLen, CoupLen For a preset coupling length, wvl is the column number of the first input matrix, 1≤wvl≤wVecLen, wVecLen is the length of the word vector of any one of the participles, and the word vector of the nth participle is WordVec n , and
    WordVec n=(WdVecEm n,1,WdVecEm n,2,......,WdVecEm n,vl,......,WdVecEm n,wVecLen), WordVec n = (WdVecEm n,1 , WdVecEm n,2 ,...,WdVecEm n,vl ,...,WdVecEm n,wVecLen ),
    Figure PCTCN2018096258-appb-100047
    FwWdMatrix n为第n个分词的第一输入矩阵;
    Figure PCTCN2018096258-appb-100047
    FwWdMatrix n is the first input matrix of the nth participle;
    第二输入矩阵构建单元,用于根据下式分别构建各个分词的第二输入矩阵:a second input matrix building unit is configured to respectively construct a second input matrix of each word segment according to the following formula:
    Figure PCTCN2018096258-appb-100048
    Figure PCTCN2018096258-appb-100048
    Figure PCTCN2018096258-appb-100049
    BkWdMatrix n为第n个分词的第二输入矩阵。
    Figure PCTCN2018096258-appb-100049
    BkWdMatrix n is the second input matrix of the nth participle.
  20. 根据权利要求19所述的语义角色分析装置,其特征在于,所述第一处理模块包括:The semantic role analysis apparatus according to claim 19, wherein the first processing module comprises:
    第一复合向量计算单元,用于根据下式分别计算各个分词的第一复合向量:The first composite vector calculation unit is configured to separately calculate the first composite vector of each word segment according to the following formula:
    FwWdCpVec n=(FwWdCpEm n,1,FwWdCpEm n,2,......,FwWdCpEm n,wvl,......,FwWdCpEm n,wVecLen)其中, FwWdCpVec n = (FwWdCpEm n,1 , FwWdCpEm n,2 ,..., FwWdCpEm n,wvl ,...,FwWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100050
    Figure PCTCN2018096258-appb-100050
    ln为自然对数函数,tanh为双曲正切函数,FwWdWt wvl、FwWdWt′ wvl均为预设的权重系数; Ln is a natural logarithm function, tanh is a hyperbolic tangent function, and FwWdWt wvl and FwWdWt' wvl are preset weight coefficients;
    词性第一概率值计算单元,用于根据下式分别计算各个词性类型的第一概率值:The part of speech first probability value calculation unit is configured to respectively calculate a first probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100051
    Figure PCTCN2018096258-appb-100051
    其中,m为词性类型的序号,1≤m≤M,M为词性类型的个数,FwWdWtVec m为预设的与第m个词性类型对应的权值向量,
    Figure PCTCN2018096258-appb-100052
    FwWdProb n,m为第n个分词是第m个词性类型的第一概率值;
    Where m is the number of the part of speech type, 1≤m≤M, M is the number of part of speech type, and FwWdWtVec m is the preset weight vector corresponding to the mth part of speech type.
    Figure PCTCN2018096258-appb-100052
    FwWdProb n,m is the first probability that the nth participle is the mth part of speech type;
    第一输出向量构建单元,用于根据下式构建各个分词的第一输出向量:a first output vector building unit for constructing a first output vector of each word segment according to the following formula:
    FwWdVec n=(FwWdProb n,1,FwWdProb n,2,......,FwWdProb n,m,......,FwWdProb n,M) FwWdVec n = (FwWdProb n,1 , FwWdProb n,2 ,...,FwWdProb n,m ,...,FwWdProb n,M )
    其中,FwWdVec n为第n个分词的第一输出向量; Where FwWdVec n is the first output vector of the nth participle;
    所述第二处理模块包括:The second processing module includes:
    第二复合向量计算单元,用于根据下式分别计算各个分词的第二复合向量:a second composite vector calculation unit, configured to separately calculate a second composite vector of each word segment according to the following formula:
    BkWdCpVec n=(BkWdCpEm n,1,BkWdCpEm n,2,......,BkWdCpEm n,wvl,......,BkWdCpEm n,wVecLen)其中, BkWdCpVec n = (BkWdCpEm n,1 , BkWdCpEm n,2 ,..., BkWdCpEm n,wvl ,..., BkWdCpEm n,wVecLen ) where
    Figure PCTCN2018096258-appb-100053
    Figure PCTCN2018096258-appb-100053
    BkWdWt wvl、BkWdWt′ wvl均为预设的权重系数; BkWdWt wvl and BkWdWt' wvl are preset weight coefficients;
    词性第二概率值计算单元,用于根据下式分别计算各个词性类型的第二概率值:The part of speech second probability value calculation unit is configured to separately calculate a second probability value of each part of speech type according to the following formula:
    Figure PCTCN2018096258-appb-100054
    Figure PCTCN2018096258-appb-100054
    其中,BkWdWtVec m为预设的与第m个词性类型对应的权值向量,BkWdProb n,m为第n个分词是第m个词性类型的第二概率值; Wherein, BkWdWtVec m is a preset weight vector corresponding to the mth part of speech type, and BkWdProb n,m is a second probability value that the nth participle is the mth part of speech type;
    第二输出向量构建单元,用于根据下式构建各个分词的第二输出向量:a second output vector building unit for constructing a second output vector of each word segment according to the following formula:
    BkWdVec n=(BkWdProb n,1,BkWdProb n,2,......,BkWdProb n,m,......,BkWdProb n,M) BkWdVec n = (BkWdProb n,1 , BkWdProb n,2 ,..., BkWdProb n,m ,...,BkWdProb n,M )
    其中,BkWdVec n为第n个分词的第二输出向量。 Where BkWdVec n is the second output vector of the nth participle.
PCT/CN2018/096258 2018-04-09 2018-07-19 Semantic role analysis method, readable storage medium, terminal device and apparatus WO2019196236A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810309685.6A CN108804411B (en) 2018-04-09 2018-04-09 A kind of semantic role analysis method, computer readable storage medium and terminal device
CN201810309685.6 2018-04-09

Publications (1)

Publication Number Publication Date
WO2019196236A1 true WO2019196236A1 (en) 2019-10-17

Family

ID=64095371

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096258 WO2019196236A1 (en) 2018-04-09 2018-07-19 Semantic role analysis method, readable storage medium, terminal device and apparatus

Country Status (2)

Country Link
CN (1) CN108804411B (en)
WO (1) WO2019196236A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110164450B (en) * 2019-05-09 2023-11-28 腾讯科技(深圳)有限公司 Login method, login device, playing equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662931A (en) * 2012-04-13 2012-09-12 厦门大学 Semantic role labeling method based on synergetic neural network
CN104462066A (en) * 2014-12-24 2015-03-25 北京百度网讯科技有限公司 Method and device for labeling semantic role

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180633B2 (en) * 2007-03-08 2012-05-15 Nec Laboratories America, Inc. Fast semantic extraction using a neural network architecture
US8392436B2 (en) * 2008-02-07 2013-03-05 Nec Laboratories America, Inc. Semantic search via role labeling
CN104021115A (en) * 2014-06-13 2014-09-03 北京理工大学 Chinese comparative sentence recognizing method and device based on neural network
CN107480122B (en) * 2017-06-26 2020-05-08 迈吉客科技(北京)有限公司 Artificial intelligence interaction method and artificial intelligence interaction device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662931A (en) * 2012-04-13 2012-09-12 厦门大学 Semantic role labeling method based on synergetic neural network
CN104462066A (en) * 2014-12-24 2015-03-25 北京百度网讯科技有限公司 Method and device for labeling semantic role

Also Published As

Publication number Publication date
CN108804411A (en) 2018-11-13
CN108804411B (en) 2019-10-29

Similar Documents

Publication Publication Date Title
US9613024B1 (en) System and methods for creating datasets representing words and objects
WO2019196314A1 (en) Text information similarity matching method and apparatus, computer device, and storage medium
WO2020062770A1 (en) Method and apparatus for constructing domain dictionary, and device and storage medium
Fan et al. Apply word vectors for sentiment analysis of APP reviews
Firmanto et al. Prediction of movie sentiment based on reviews and score on rotten tomatoes using sentiwordnet
US10915707B2 (en) Word replaceability through word vectors
US20200278976A1 (en) Method and device for evaluating comment quality, and computer readable storage medium
WO2021169423A1 (en) Quality test method, apparatus and device for customer service recording, and storage medium
CN111813895B (en) Attribute level emotion analysis method based on level attention mechanism and door mechanism
US20200073890A1 (en) Intelligent search platforms
WO2020063071A1 (en) Sentence vector calculation method based on chi-square test, and text classification method and system
US20240111956A1 (en) Nested named entity recognition method based on part-of-speech awareness, device and storage medium therefor
CN111291177A (en) Information processing method and device and computer storage medium
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
WO2022228127A1 (en) Element text processing method and apparatus, electronic device, and storage medium
Singh et al. Sentiment analysis using lexicon based approach
US11829722B2 (en) Parameter learning apparatus, parameter learning method, and computer readable recording medium
CN109344246B (en) Electronic questionnaire generating method, computer readable storage medium and terminal device
CN115168580A (en) Text classification method based on keyword extraction and attention mechanism
US20220222442A1 (en) Parameter learning apparatus, parameter learning method, and computer readable recording medium
WO2019196236A1 (en) Semantic role analysis method, readable storage medium, terminal device and apparatus
CN108694176B (en) Document emotion analysis method and device, electronic equipment and readable storage medium
CN112632272A (en) Microblog emotion classification method and system based on syntactic analysis
Sajeevan et al. An enhanced approach for movie review analysis using deep learning techniques
CN109670171B (en) Word vector representation learning method based on word pair asymmetric co-occurrence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18914135

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18914135

Country of ref document: EP

Kind code of ref document: A1