WO2020211611A1 - 用于语言处理的循环神经网络中隐状态的生成方法和装置 - Google Patents

用于语言处理的循环神经网络中隐状态的生成方法和装置 Download PDF

Info

Publication number
WO2020211611A1
WO2020211611A1 PCT/CN2020/081177 CN2020081177W WO2020211611A1 WO 2020211611 A1 WO2020211611 A1 WO 2020211611A1 CN 2020081177 W CN2020081177 W CN 2020081177W WO 2020211611 A1 WO2020211611 A1 WO 2020211611A1
Authority
WO
WIPO (PCT)
Prior art keywords
word vector
vector
regional
word
matrix
Prior art date
Application number
PCT/CN2020/081177
Other languages
English (en)
French (fr)
Inventor
孟凡东
张金超
周杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP20790836.9A priority Critical patent/EP3958148A4/en
Priority to JP2021525643A priority patent/JP7299317B2/ja
Publication of WO2020211611A1 publication Critical patent/WO2020211611A1/zh
Priority to US17/332,318 priority patent/US20210286953A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of computer technology, in particular to a method, device, computer-readable storage medium, and computer equipment for generating hidden states in a recurrent neural network for language processing.
  • Natural language processing Natural Language Processing
  • NLP Natural language processing
  • NLP realizes the long-awaited "communication with computer using natural language”.
  • NLP natural language processing
  • processing variable length word sequences is still a big challenge.
  • Recurrent Neural Network is a type of recursive neural network that takes sequence data as input, recursively in the evolution direction of the sequence, and all nodes (recurrent units) are connected in a chain.
  • the emergence of cyclic neural networks solves the processing of variable length word sequences.
  • This application provides a method, device, computer-readable storage medium, and computer equipment for generating hidden states in a recurrent neural network for language processing.
  • This technical solution has a high capture rate for complex linguistic laws.
  • the technical scheme is as follows:
  • a method for generating hidden states in a recurrent neural network for language processing is provided.
  • the method is applied to a computer device, and the method includes:
  • a target hidden state corresponding to the target word vector is generated.
  • a device for generating hidden states in a recurrent neural network for language processing includes:
  • a regional word vector generating module which is used to generate a regional word vector of at least two dimensions of the target word vector input at the first moment;
  • a regional word vector combination module configured to combine each of the regional word vectors to obtain a combined regional word vector of at least two dimensions
  • An aggregation transformation processing module configured to perform aggregation transformation processing on each of the combined region word vectors based on a feedforward neural network to obtain an aggregation word vector corresponding to the target word vector;
  • the target hidden state generating module is configured to generate the target hidden state corresponding to the target word vector based on the aggregated word vector.
  • a computer-readable storage medium stores a computer program.
  • the processor executes any of the above aspect and its optional embodiments.
  • a computer device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the above aspect and its optional implementations. Take any of the aforementioned methods for generating hidden states in a recurrent neural network for language processing.
  • This method generates at least two-dimensional regional word vectors of the target word vector input at the first moment, so that a single-dimensional target word vector corresponds to a regional word vector of multiple dimensions, and combines each regional word vector to obtain Combining regional word vectors of at least two dimensions; then performing aggregation transformation processing on each combined regional word vector based on the feedforward neural network to obtain the aggregated word vector corresponding to the target word vector, so that the target word vector at each moment has a corresponding aggregation The word vector, so that the target hidden state corresponding to the target word vector can be generated on the basis of the aggregated word vector, and because the aggregated word vector is obtained by multi-dimensional conversion processing on the target word vector, the generated The target hidden state has a high capture rate for complex linguistic laws.
  • this method obtains the aggregated word vector after multi-dimensional conversion by performing deep multi-region combination calculation on the target word vector, and enhances the linguistic law captured in the word vector, such as enhancing the long-distance dependence in the word vector , So that the target hidden state generated by the aggregate word vector can capture the complex linguistic laws with greater probability.
  • Fig. 1 is a structural block diagram of a computer device provided by an exemplary embodiment of the present application
  • Fig. 2 is a flowchart of a method for generating hidden states in a recurrent neural network for language processing provided by an exemplary embodiment of the present application;
  • FIG. 3 is a flowchart of a method for generating hidden states in a recurrent neural network for language processing provided by another exemplary embodiment of the present application;
  • Fig. 4 is an exemplary diagram of a vector node provided by an exemplary embodiment of the present application.
  • Fig. 5 is an exemplary diagram of an adjacency matrix provided by an exemplary embodiment of the present application.
  • Fig. 6 is an exemplary diagram of a degree matrix provided by an exemplary embodiment of the present application.
  • Fig. 7 is an exemplary diagram of a regional word vector matrix provided by an exemplary embodiment of the present application.
  • FIG. 8 is a flowchart of a method for generating hidden states in a recurrent neural network for language processing provided by another exemplary embodiment of the present application;
  • FIG. 9 is an example diagram of a method for calculating a regional word vector provided by an exemplary embodiment of the present application.
  • FIG. 10 is a flowchart of a method for generating hidden states in a recurrent neural network for language processing provided by another exemplary embodiment of the present application;
  • FIG. 11 is an example diagram of a method for generating clustered word vectors provided by an exemplary embodiment of the present application.
  • Fig. 12 is a structural block diagram of an apparatus for generating hidden states in a recurrent neural network for language processing provided by an exemplary embodiment of the present application.
  • the method for generating hidden states in a recurrent neural network for language processing provided in this application can be applied to the computer device 100 shown in FIG. 1.
  • the computer device 100 includes a memory 101 and a processor 102.
  • the memory 101 may include a non-volatile storage medium and internal memory.
  • a computer program is stored in the memory 101, and when the computer program is executed by the processor 102, the method for generating a hidden state in a recurrent neural network for language processing provided in this application can be realized.
  • the computer device 100 further includes a network interface 103, and the network interface 103 is used to connect the computer device 100 to a wired or wireless network.
  • the computer device 100 further includes a system bus 104, wherein the memory 101 is electrically connected to the processor 102 and the network interface 103 through the system bus 104, respectively.
  • the computer device 100 may be a terminal or a server. It can be understood that when the computer device 100 is a terminal, the computer device 100 may also include a display screen, an input device, and the like.
  • the terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the servers can be implemented by independent servers or a server cluster composed of multiple servers.
  • a method for generating hidden states in a recurrent neural network for language processing is provided.
  • the method is mainly applied to the computer device in FIG. 1 as an example.
  • the method for generating hidden states in a recurrent neural network for language processing specifically includes the following steps:
  • S202 Generate a regional word vector of at least two dimensions of the target word vector input at the first moment.
  • the word vector refers to the real number vector of the corresponding word in the predefined vector space.
  • the real number vector of "dog” in the predefined vector space can be (0.2 0.2 0.4), then (0.2 0.2 0.4) is the word vector of "dog".
  • the target word vector refers to the word vector input at the first moment.
  • the regional word vector refers to the word vectors of different dimensions corresponding to the word vector of one dimension.
  • the first moment is the moment when the target word vector is input; for example, the first moment may include the current moment, that is, the moment currently indicated by the clock.
  • the computer device when the computer device detects a target word vector input at the first moment, the computer device reads the target word vector input at the first moment, and triggers the region word vector generation instruction of the target word vector.
  • the computer device converts the low-dimensional target word vector into at least two-dimensional regional word vector according to the regional word vector generation instruction. In this way, the target word vector input by the computer device at each time corresponds to a regional word vector of at least two dimensions.
  • the computer device At each moment in the vector sequence X, the computer device generates a regional word vector of at least two dimensions of the target word vector input at that moment.
  • a social application program for communication is installed in the computer device, and a sub-application program for man-machine dialogue runs in the social application program for communication.
  • the computer device detects that the sub-application for human-machine dialogue receives variable-length voice information, it converts the voice information received at each moment into text information, and maps the text information to the target word vector.
  • the long voice information will eventually form a vector sequence, which includes the target word vector corresponding to the text information of the voice information received at each moment.
  • the server can receive various target word vectors that have been converted by other terminals, and generate at least two-dimensional regional word vectors of the target word vector input at each moment.
  • the server can also directly receive variable-length voice information received by other terminals through a sub-application for man-machine dialogue, and convert the voice information received at each moment into text information. And map the text information to the target word vector, so that the variable-length voice information will eventually form a vector sequence, which includes the target word vector corresponding to the text information of the voice information received at each time, and generates each time input A regional word vector of at least two dimensions of the target word vector.
  • the regional word vector generation instruction may carry the first preset dimension.
  • the computer device converts the low-dimensional target word vector into the regional word vector of at least two dimensions according to the regional word vector generation instruction, it can follow
  • the first preset dimension converts the low-dimensional target word vector into the first preset dimension regional word vector.
  • the first time is T
  • the first preset dimension is N
  • the computer device detects that the target word vector X T is input at the first time T.
  • the computer device needs to convert a single-dimensional target word vector X T into an N-dimensional regional word vector.
  • N is greater than 1.
  • the combined regional word vector refers to the word vector obtained by combining various regional word vectors.
  • the target word vector has a region word vector of N dimensions
  • the computer device can obtain a combined region word vector of J dimensions after a combination calculation of the region word vector of N dimensions, and J is greater than or equal to 2.
  • the computer device is preset with an area vector combination mode. After the computer device generates the regional word vector of the target word vector, it obtains a preset combination of regional vectors, and the combination of regional vectors includes the second preset dimension. The computer device performs a combination calculation on the regional word vectors of the target word vector according to a preset regional vector combination method to obtain a combined regional word vector of the second preset dimension.
  • the regional vector combination method refers to the method of combining various regional word vectors.
  • S206 Perform aggregation transformation processing on each combined region word vector based on the feedforward neural network to obtain an aggregate word vector corresponding to the target word vector.
  • the feedforward neural network is a neural network in which each neuron is arranged in layers.
  • Aggregation transformation processing refers to the process of aggregation processing and transformation processing on each combined region word vector.
  • the aggregation word vector refers to the word vector obtained after the aggregation processing and transformation processing are performed on the word vectors of each combination area.
  • the computer device may perform a transformation on each combined region word vector based on the feedforward neural network to obtain the middle region word vector with the same dimension as the combined region word vector.
  • the computer device performs aggregation processing on each of the obtained intermediate region word vectors to obtain an intermediate aggregation word vector.
  • the computer device can perform a linear transformation on the obtained intermediate aggregate word vector to obtain the aggregate word vector corresponding to the target word vector.
  • S208 Generate a target hidden state corresponding to the target word vector based on the aggregated word vector.
  • the hidden state refers to the hidden state output by the hidden layer of the recurrent neural network
  • the hidden state refers to the system status (system status) of the recurrent neural network
  • the target hidden state refers to the system status of the recurrent neural network at the first moment.
  • the computer device can obtain the historical hidden state of the historical word vector at the previous moment, and the computer device can add the aggregated word vector of the target word vector on the basis of the historical hidden state to calculate the target hidden state of the target word vector to generate the target The target hidden state of the word vector. It is understandable that the historical hidden state of the historical word vector at the previous moment is also generated based on the aggregated word vector of the historical word vector, and the aggregated word vector is obtained by performing multi-dimensional conversion processing on the historical word vector.
  • the above method for generating hidden states in a recurrent neural network for language processing generates at least two-dimensional regional word vectors of the target word vector input at the first moment, so that a single-dimensional target word vector corresponds to multiple-dimensional regional words Vector, and combine each regional word vector to obtain at least two-dimensional combined regional word vector. Then, based on the feedforward neural network, the word vectors of each combined region are aggregated and transformed to obtain the aggregated word vector corresponding to the target word vector.
  • the target word vector at each moment has a corresponding aggregated word vector, so that the target hidden state corresponding to the target word vector can be generated on the basis of the aggregated word vector. Since the aggregated word vector is obtained by performing multi-dimensional conversion processing on the target word vector, the target hidden state generated by the aggregated word vector has a high capture rate for complex linguistic laws. For example, in computer equipment processing tasks such as handwriting recognition, sequence labeling, sentiment analysis, language model training, and machine translation, even if it encounters long-distance dependent language structures, tasks can be completed efficiently.
  • generating the regional word vector of at least two dimensions of the target word vector input at the first moment includes: obtaining at least two first weight matrices, each of which is used to generate a corresponding regional word vector; Determine the target word vector input at the first moment, and obtain the historical hidden state corresponding to the historical word vector at the previous moment; generate at least two-dimensional regional word vectors of the target word vector based on the first weight matrix and the historical hidden state.
  • the first weight matrix refers to a weight parameter in the form of a matrix that is trained as the system is used to generate the corresponding regional word vector. That is, the first weight matrix is a system parameter in matrix form obtained by training the system with the number of samples.
  • the historical word vector refers to the word vector input by the computer device at the first moment in time.
  • the historical hidden state refers to the hidden state corresponding to the word vector input by the computer device at the first moment in time.
  • the computer device when the computer device detects a target word vector input at the first moment, the computer device reads the target word vector input at the first moment, and triggers the region word vector generation instruction of the target word vector.
  • the computer device obtains the first weight matrix for generating the regional word vector according to the regional word vector generation instruction, and the number of the obtained first weight matrix is the same as the number of dimensions of the regional word vector that the computer device needs to generate.
  • the first preset dimension of the regional word vector that the computer device needs to generate is N, and the number of the first weight matrix obtained by the computer device is N.
  • the computer device When the computer device generates the regional word vector of each dimension, there is a corresponding first weight matrix: when the computer device generates the regional word vector Z 1 of the first dimension, there is a corresponding first weight matrix W 1 ; When generating the regional word vector Z 2 of the second dimension, there is a corresponding first weight matrix W 2 ; ...; when the computer device generates the regional word vector Z N of the Nth dimension, there is a corresponding first weight matrix W N.
  • the computer device determines the target word vector input at the first moment, and obtains the historical hidden state corresponding to the historical word vector input by the computer device at the first moment and the previous moment. It is understandable that the last moment is not necessarily a moment closely adjacent to the first moment, and the last moment is the moment corresponding to when the computer device entered the previous word vector of the target word vector.
  • X 1 represents the word vector input by the computer equipment at the first time
  • X 2 Represents the word vector input by the computer device at the second moment.
  • the computer device may generate the regional word vector of the first preset dimension based on the acquired historical hidden state and the first preset number of first weight matrices.
  • the first preset quantity is the same as the quantity of the first preset dimension.
  • the regional word vector of the first preset dimension as a whole can be a regional word vector matrix.
  • a computer device needs to convert the target word vector X T into a regional word vector of N dimensions, and the obtained regional word vector of N dimensions can be expressed Is the regional word vector matrix Z 1 to Z N in the regional word vector matrix are all regional word vectors of the target word vector X T.
  • the computer device directly uses the first weight matrix used to generate the corresponding regional word vector to efficiently convert a single-dimensional target word vector into at least two-dimensional regional word vector.
  • a regional word vector of at least two dimensions is generated on the basis of the historical hidden state at the previous moment, so that the obtained regional word vector is more accurate.
  • generating the regional word vector of at least two dimensions of the target word vector based on the first weight matrix and the historical hidden state includes: splicing the target word vector with the historical hidden state to obtain the spliced word vector; And the first weight matrix to generate a regional word vector matrix; the regional word vector matrix includes at least two-dimensional regional word vectors.
  • the hidden state at each moment generated by the computer device is in the form of a vector. Therefore, after the computer device determines the target word vector and obtains the historical hidden state corresponding to the historical word vector at the previous moment, it can convert the hidden state at the first moment
  • the target word vector is spliced with the historical hidden state at the previous moment to obtain the spliced word vector.
  • the target word vector contains 8 vector elements
  • the historical hidden state contains 5 vector elements.
  • the computer device directly splices the target word vector with the historical hidden state
  • the resulting spliced word vector contains 13 vector elements.
  • the computer device multiplies the obtained splicing word vectors with each first weight matrix to obtain the area vector matrix.
  • the region vector matrix contains regional word vectors of multiple dimensions.
  • W i represents the first weight matrix.
  • a computer device needs to generate N regional word vectors, then i is 1 to N, Z i is Z 1 to Z N , and W i is W 1 to W N.
  • Z N W N [X t ,h t-1 ].
  • the computer equipment can get the regional word vector matrix 1 to N respectively represent the dimensions of the corresponding regional word vectors Z 1 to Z N.
  • t is an integer greater than 1.
  • each regional word vector in the regional word vector matrix is in a different dimension
  • each regional word vector includes multiple vector elements
  • each vector element is a matrix element of the dimension of the regional word vector to which it belongs.
  • Z 1 comprises three vector elements 0.3,0.8 and 0.7
  • the matrix elements of the first dimension is 0.3 where Z 1 Z 11, 0.8 for the first dimension of the matrix elements where Z 1 Z 12, 0.7 for the first location Z 1 Matrix element of dimension Z 13 .
  • the region word vector matrix can be expressed as
  • the computer device directly splices the target word vector with the hidden state at the previous moment to obtain the spliced word vector, and directly multiplies the spliced word vector with at least two first weight matrices, so as to obtain more efficiently and quickly.
  • combining each region word vector to obtain a combined region word vector of at least two dimensions includes:
  • the edge weight refers to the weight of the edge connecting each vector node when each region word vector is used as a vector node.
  • the region vector combination method preset by the computer device may be a region vector combination method based on graph convolution (graph convolutional networks).
  • a computer device generates a 3-dimensional regional word vector of the target word vector: Z 1 , Z 2 and Z 3 , then the computer device determines Z 1 , Z 2 and Z 3 as the vector nodes 401 respectively .
  • the edges 402 connected between each vector node represent the relationship between the two connected vector nodes.
  • the computer device can calculate the similarity between the various vector nodes, and determine the similarity between the various vector nodes as the edge weights of the corresponding edges between the various vector nodes.
  • Z i T refers to the transposed vector of the region word vector Z i .
  • " refers to the L2 norm of the regional word vector Z i
  • " refers to the L2 norm of the regional word vector Z j .
  • the computer device can obtain the similarity between the word vectors in each region according to the above formula, and determine the similarity between the vector nodes as the edge weights of the edges between the corresponding vector nodes.
  • j is a positive integer.
  • S304 Generate an adjacency matrix corresponding to each region word vector according to the determined edge weights.
  • the adjacency matrix (Adjacency Matrix) is a matrix used to represent the neighbor relationship between vector nodes.
  • the computer device may use the determined edge weights as matrix elements to form an adjacency matrix. For example, the computer device generates N-dimensional regional word vectors of the target word vector, and the computer device determines the N regional word vectors as vector nodes, and calculates the edge weights between each of the N vector nodes. In this way, the adjacency matrix A formed by the computer device using the determined edge weights as matrix elements can be as shown in FIG. 5.
  • the degree matrix refers to a matrix formed by the degree of the vector nodes of each row or each column of the adjacency matrix, and the degree of the vector node of each row or each column is the sum of the matrix elements contained in each row or each column of the adjacency matrix.
  • each row in the adjacency matrix A includes the edge weights of the edges between a certain vector node and other vector nodes.
  • W 12 in FIG. 5 can represent the edge weight of the edge between the first node and the second node in each vector node.
  • the first row in the adjacency matrix includes the edge weights between the vector node Z 1 and other vector nodes: W 11 , W 12 , W 13 ,..., W 1N , then the computer device will perform W 11 to W 1N Sum, the degree D 11 of the vector node Z 1 can be obtained.
  • D 11 ⁇ j W ij is 1 to N
  • D 11 W 11 +W 12 +W 13 + whil +W 1N .
  • the degree of the vector node in each row of the adjacency matrix obtained by the computer equipment can be expressed as: D 11 , D 22 , D 33 ,..., D NN , and the computer equipment is based on "D 11 , D 22 , D 33 ,..., D NN ”can form a degree matrix D.
  • D 11 , D 22 , D 33 ,..., D NN and other matrix elements are all 0, as shown in Figure 6.
  • S308 Generate a combined region word vector of at least two dimensions based on the adjacency matrix and the degree matrix.
  • the computer device may generate the combined regional word vector of the second preset dimension based on the obtained adjacency matrix and the degree matrix.
  • the number of the second preset dimension is the same as the number of the regional word vectors of the target word vector.
  • the computer device uses each regional word vector of the target word vector as the vector node of the graph structure in the graph convolutional network, and can calculate the edge weight between each vector node, and then obtain the difference between the various regional word vectors. Edge weight, use the obtained edge weight to generate the adjacency matrix, and calculate the degree matrix based on the adjacency matrix. In this way, the computer equipment can directly use the adjacency matrix and the degree matrix to efficiently generate the combined region word vector.
  • generating a combined region word vector of at least two dimensions based on the adjacency matrix and the degree matrix includes: determining a region word vector matrix corresponding to each region word vector; obtaining a second weight for generating the combined region word vector matrix Matrix; generate a combined region word vector matrix according to the adjacency matrix, degree matrix, region word vector matrix and the second weight matrix; the combined region word vector matrix includes at least two dimensions of region word vectors.
  • the regional word vector matrix refers to a matrix in which the vector elements contained in each regional word vector are used as matrix elements.
  • the second weight matrix refers to a weight parameter in the form of a matrix in the graph convolutional network that is trained as the system is used to generate a combined region word vector matrix. That is, the second weight matrix is the system parameters obtained by training the system through sample data.
  • the regional word vector matrix Z is shown as 700 in FIG. 7. Among them, M is an integer.
  • D refers to the degree matrix
  • A refers to the adjacency matrix
  • Z refers to the regional word vector matrix
  • is the activation function.
  • the activation function ⁇ may specifically be a sigmoid function "sigmoid(x)".
  • the sigmoid function is a common sigmoid function in biology, which is also called an sigmoid growth curve.
  • the sigmoid function is used as the threshold function in the recurrent neural network.
  • the computer device can use the activation function ⁇ to obtain the combined area word vector matrix O with the same dimension as the area word vector of the target word vector.
  • Each row of the combined area word vector matrix O serves as a dimension, and each dimension has a combined area word vector.
  • each regional word vector is taken as a regional word vector matrix
  • the second weight matrix used to generate the combined regional word vector matrix is used to generate the combined regional word corresponding to the regional word vector matrix based on the adjacency matrix and the degree matrix Vector matrix
  • the generated combined area word vector matrix includes at least two-dimensional area word vectors, which further improves the efficiency of generating the combined area word vector.
  • combining each region word vector to obtain a combined region word vector of at least two dimensions includes:
  • S802 Determine at least two prediction vectors corresponding to each regional word vector according to the third weight matrix used to generate the combined regional word vector.
  • the third weight matrix refers to a weight parameter in the form of a matrix in capsule networks that is trained as the system is used to generate a combined region word vector matrix. That is, the third weight matrix is the system parameters obtained by training the system through sample data.
  • the prediction vector refers to the intermediate variable in the form of a vector in the process of generating the word vector matrix of the combined region.
  • the region vector combination mode preset by the computer device may be the region vector combination method based on the capsule network.
  • the computer device obtains each matrix element W C ij in the third weight matrix W C used to generate the combined region word vector, where i is 1 to N, N is the total number of capsules, and j is 1 to the second preset dimension
  • the number of the second preset dimension in this embodiment is greater than or equal to 2 and less than or equal to N, and ij represents the i-th row and j-th column of the third weight matrix W C.
  • 901-904 in Fig. 9 are the initialization phases for the computer equipment to combine the regional word vectors of the target word vector according to the region vector combination method based on the capsule network.
  • the region vector combination method of the target word vector is the iterative calculation stage of the combination calculation of the region word vector.
  • the computer device may generate a prediction vector Z j
  • S804 Determine at least two logarithms of prior probability corresponding to each regional word vector.
  • the logarithm of the prior probability refers to a temporary variable in the form of a vector in the process of generating the word vector matrix of the combined region.
  • the computer device obtains each prior logarithm b ij from the prior logarithmic probability matrix B, and the number of prior logarithms b ij included in the prior logarithmic matrix B is the total number of capsules*th 2. The number of preset dimensions. As shown in 902 in FIG. 9, since it is in the initialization stage at this time, all the prior probability logarithms b ij in the prior probability logarithm matrix B are 0.
  • S806 Determine coupling coefficients corresponding to the word vectors in each region according to the logarithm of the prior probability.
  • the computer equipment enters the iterative calculation phase.
  • the computer equipment normalizes the obtained logarithms b ij of the prior probability, and the formula is as follows: Obtain the coupling coefficients C ij between each regional word vector and each corresponding combined regional word vector to be generated.
  • exp() refers to an exponential function with e as the base.
  • S808 Generate a candidate combination region word vector of at least two dimensions based on the coupling coefficient and the prediction vector.
  • the computer device generates the combined area word vector O j of the second preset dimension through the nonlinear activation function squash (Sj). among them, Among them, "
  • the computer device repeatedly executes the above three steps from step S804 to step S808 to iteratively calculate the word vector of the candidate combination region until it stops the iteration when the preset iterative condition is met.
  • the candidate combination region word vector is determined to be a combination region word vector of at least two dimensions.
  • the preset iteration condition may be the preset number of iterations. For example, if the preset number of iterations is 3 times, the computer device will stop the iteration and output the third generation Each combination area word vector.
  • step S804 to step S808 For example, if the preset number of iterations is 5, the above three steps from step S804 to step S808 are repeated 5 times, after step S804 to step S808 are executed for the fifth time, the execution is stopped again, and the step is executed for the fifth time.
  • the candidate combination region word vector obtained after S804 to step S808 is used as a combination region word vector of at least two dimensions.
  • the computer device uses each region word vector of the target word vector as a capsule in the capsule network, and uses the third weight matrix used to generate the combined region word vector in the capsule network to generate at least two predictions corresponding to each region word vector Vector, and obtain at least two initial logarithms of a priori probability corresponding to each regional word vector.
  • the iterative algorithm for the logarithm of the prior probability in the capsule network is used to generate the final combined region word vector more efficiently and accurately.
  • the iterative algorithm for the logarithm of the prior probability in the capsule network is used to efficiently perform multiple iterative calculations on the combination interval word vector. At the same time, it can better capture complex linguistic laws through multiple iterations.
  • determining at least two prior probability logarithms corresponding to each of the regional word vectors further includes: determining the scalar product between each combined regional word vector and each corresponding prediction vector; and combining each scalar product with the corresponding prediction vector The logarithm of the prior probability of is added to obtain the logarithm of the prior probability corresponding to the newly determined word vectors in each region.
  • i ⁇ O j refers to the scalar product between the prediction vector Z j
  • 1 ⁇ O 1 a 1 c 1 +a 2 c 2 + whil+a n c n ; add the current b 11 and Z 1
  • 1 ⁇ O 1 to get a new prior The logarithm of probability b 11 b 11 +Z 1
  • the computer device sums the scalar product between each combined region word vector and each corresponding prediction vector and the current logarithm of the prior probability to obtain multiple re-determined logarithms of the prior probability. After iteration, the logarithm of the prior probability is more accurate, so that the final combined region word vector can be generated more efficiently and accurately.
  • performing aggregation transformation processing on each combination region word vector based on the feedforward neural network to obtain the aggregation word vector corresponding to the target word vector includes: transforming each combination region word vector based on the feedforward neural network to obtain the transformation After the combined region word vector; splicing each transformed combination region word vector to obtain the spliced word vector; linearly transform the spliced word vector to obtain the aggregated word vector corresponding to the target word vector.
  • the computer device performs a deeper transformation on the word vectors of each combined region based on the feedforward neural network to obtain the aggregated word vectors, so that the target hidden state generated based on the aggregated word vectors is used to capture complex linguistic laws.
  • the capture rate of linguistic laws is high.
  • transforming each combination region word vector based on the feedforward neural network to obtain the transformed combination region word vector includes: linearizing each combination region word vector according to the fourth weight matrix and the first bias vector Transform to obtain the temporary word vector corresponding to each combination area word vector; select the maximum vector value of each temporary word vector and the vector threshold respectively; perform linear transformation on each maximum vector value according to the fifth weight matrix and the second bias vector, Get the transformed combined region word vector.
  • the fourth weight matrix refers to a weight parameter in the form of a matrix in the feedforward neural network that is trained as the system is used for the first linear transformation of each combined area vector in the feedforward neural network.
  • the fifth weight matrix refers to a weight parameter in the form of a matrix in the feedforward neural network that is trained as the system is used for the second linear transformation of each combined area vector in the feedforward neural network.
  • the first bias vector refers to a bias parameter in the form of a vector in the feedforward neural network as the system trains, and is used to perform the first linear transformation of each combined area vector in the feedforward neural network.
  • the second bias vector refers to a bias parameter in the form of a vector in the feedforward neural network as the system trains, and is used to perform a second linear transformation on each combined area vector in the feedforward neural network.
  • the fourth weight matrix and the fifth weight matrix are system parameters in matrix form obtained by training the system with the number of samples.
  • the computer device obtains the fourth weight matrix W 1 and the first bias vector b 1 in the feedforward neural network, and uses the fourth weight matrix W 1 and the first bias vector b 1 to pair each combination region word vector O j Perform the first linear transformation: O j W 1 +b 1 to obtain the temporary word vector corresponding to the word vector of each combination area. Each temporary word variable is compared with the vector threshold, and the maximum vector value between each temporary word variable and the vector threshold is selected.
  • the computer device compares each temporary word variable with the vector threshold 0, and selects the maximum vector value max(0,O j W 1 +b 1 ) through the Relu function "max(0,X)" , Take the temporary word variable greater than the vector threshold 0 as the maximum vector value of the temporary word variable and the vector threshold 0, and use the vector threshold 0 greater than the temporary word variable as the maximum vector value of the temporary word variable and the vector threshold 0.
  • the computer device obtains the fifth weight matrix W 2 and the second bias vector b 2 in the feedforward neural network, and uses the fifth weight matrix W 2 and the second bias vector b 2 to perform the first step on each combination region word vector O j
  • the computer device uses the fourth weight matrix and the first bias vector in the feedforward neural network to perform the first linear transformation on the word vectors of each combination area to obtain the temporary word vector, and select the temporary word vector and The maximum vector value in the vector threshold is used to perform the second linear transformation on the maximum vector value using the fifth weight matrix and the second bias vector in the feedforward neural network to obtain the transformed combined region word vector.
  • the computer device can use the combined regional word vector to generate an aggregated word vector, so that when using the target hidden state generated based on the aggregated word vector to capture complex linguistic laws, the rate of capturing complex linguistic laws is high.
  • the clustered word vector includes a first clustered word vector and a second clustered word vector, the first clustered word vector is different from the second clustered word vector; generating the target hidden state corresponding to the target word vector based on the clustered word vector includes: Determine the candidate hidden state corresponding to the target word vector based on the first aggregate word vector and the corresponding first activation function; determine the gating parameter corresponding to the target word vector based on the second aggregate word vector and the corresponding second activation function; according to the candidate hidden state , Gating parameters and the historical hidden state of the historical word vector at the previous moment generate the target hidden state corresponding to the target word vector.
  • the computer device when the computer device generates the regional word vector of the target word vector, it should generate the aggregate word vector based on the first weight matrix corresponding to the first aggregate word vector and the first weight matrix corresponding to the second aggregate word vector.
  • the aggregate word vector corresponding to the target word vector finally obtained by the computer device is the first aggregate word vector M h .
  • the aggregate word vector corresponding to the target word vector finally obtained by the computer device is the second aggregate word vector M g .
  • is the element product operator, "(1-g t ) ⁇ h t-1 "refers to the operation of the element product of (1-g t ) and h t-1 , "g t ⁇ h ⁇ t " refers to The element product calculation is performed on g t and h to t .
  • the candidate hidden state obtained based on the first aggregate word vector is different from that based on the second aggregate word vector.
  • the gating parameters obtained by the vector are more accurate, so that when using the target hidden state obtained based on more accurate candidate hidden states and gating parameters to capture complex linguistic laws, the rate of capturing complex linguistic laws is high.
  • the method for generating hidden states in a recurrent neural network for language processing may include the following steps:
  • S1002 The computer device generates a regional word vector of the first preset dimension according to the regional word vector generation formula.
  • the first preset dimension is N
  • i is 1 to N
  • S1004 The computer device performs a combination calculation on the regional word vectors of the first preset dimension according to a preset regional vector combination method to obtain a combined regional word vector of the second preset dimension.
  • J can be equal to N or not equal to N.
  • the preset area vector combination method is a graph convolution based area vector combination method
  • the second preset dimension J is equal to the first preset dimension N.
  • the preset area vector combination method is the area vector combination method based on the capsule network
  • the second preset dimension J is greater than or equal to 2 and less than or equal to the first preset dimension N.
  • S1006 Perform a deep transformation on each combined region word vector based on the feedforward neural network to obtain a middle region word vector of the second preset dimension.
  • each combined region word vector through a feedforward neural network (Feedforward Neural Network, FNN)
  • FNN feedforward Neural Network
  • f J max(0,O j W 1 +b 1 )W 2 +b 2 .
  • the second preset dimension is J
  • S1008 The computer device splices the middle region word vectors of the second preset dimension to obtain a spliced word vector, and performs a linear transformation on the spliced word vector to obtain an aggregated word vector.
  • S1010 The computer device generates a target hidden state corresponding to the target word vector based on the aggregated word vector.
  • the cluster word vector is divided into a first cluster word vector M h and a second cluster word vector M g .
  • the computer device receives variable-length voice information, and the computer device converts the voice information received at each moment into text information, and maps the text information as a target
  • the word vector generates the target hidden state of each target word vector through the steps in the method for generating the hidden state in the recurrent neural network for language processing in any of the above embodiments.
  • the computer device can calculate the average hidden state of the multiple target hidden states, and use the average hidden state as h t-1 , and X t is a 0 vector. Calculate the first aggregate word vector M h and the second aggregate word vector M g based on h t-1 and X t .
  • the intermediate hidden state h t is a vector containing 100 vector elements, and the intermediate hidden state h t can be multiplied by a weight matrix W v containing 100*Y to obtain an intermediate vector containing Y vector elements.
  • softmax intermediate vector
  • Y probability values can be obtained, and each probability value represents the probability of a word in the corresponding word list. For example, if Y is 10000, the computer device can obtain 10000 probability values.
  • the computer device uses the word corresponding to the largest probability value among the Y probability values as the first word that the computer device needs to respond to in the current human-machine dialogue.
  • the computer device uses the word vector of the first word that the computer device needs to reply to as X t , and uses the intermediate hidden state h t as h t-1 , and continues to calculate the first aggregate word vector M based on h t-1 and X t In the steps of h and the second aggregation of the word vector M g , following the same calculation steps, the computer device can obtain the second word, the third word, and the fourth word that needs to be answered. Until the maximum probability value obtained meets the end condition, the iteration ends. Further, the end condition may be that the word corresponding to the maximum probability value is a designated end symbol.
  • Figures 2, 3, 8 and 10 are schematic flowcharts of a method for generating hidden states in a recurrent neural network for language processing in an embodiment. It should be understood that although the various steps in the flowcharts of FIGS. 2, 3, 8 and 10 are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2, 3, 8 and 10 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The order of execution of the sub-steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or stages of other steps.
  • a device 1200 for generating hidden states in a recurrent neural network for language processing is provided.
  • the device can be implemented as a part of computer equipment through software, hardware, or a combination of both. Or all; the device includes a regional word vector generation module 1201, a regional word vector combination module 1202, an aggregation transformation processing module 1203, and a target hidden state generation module 1204, wherein:
  • the regional word vector generating module 1201 is configured to generate a regional word vector of at least two dimensions of the target word vector input at the first moment.
  • the regional word vector combination module 1202 is used to combine various regional word vectors to obtain a combined regional word vector of at least two dimensions.
  • the aggregation transformation processing module 1203 is configured to perform aggregation transformation processing on each combined region word vector based on the feedforward neural network to obtain an aggregation word vector corresponding to the target word vector.
  • the target hidden state generation module 1204 is configured to generate the target hidden state corresponding to the target word vector based on the aggregated word vector.
  • the regional word vector generation module is further used to: obtain at least two first weight matrices, each first weight matrix is used to generate a corresponding regional word vector; determine the target word vector input at the first moment, And obtain the historical hidden state corresponding to the historical word vector at the previous moment; based on the first weight matrix and the historical hidden state, a regional word vector of at least two dimensions of the target word vector is generated.
  • the regional word vector generation module is further used to: splice the target word vector with the historical hidden state to obtain a spliced word vector; generate a regional word vector matrix according to the spliced word vector and the first weight matrix; the regional word vector The matrix includes at least two-dimensional regional word vectors.
  • the regional word vector combination module is further used to: determine the edge weights between the word vectors in each region; generate an adjacency matrix corresponding to each regional word vector according to the determined edge weights; and respectively combine the dimensions of the adjacency matrix The weights of each edge are added to obtain a degree matrix; based on the adjacency matrix and the degree matrix, a combined region word vector of at least two dimensions is generated.
  • the regional word vector combination module is further used to: determine the regional word vector matrix corresponding to each regional word vector; obtain the second weight matrix used to generate the combined regional word vector matrix; according to the adjacency matrix and the degree matrix , The regional word vector matrix and the second weight matrix generate a combined regional word vector matrix; the combined regional word vector matrix includes at least two-dimensional regional word vectors.
  • the regional word vector combination module is also used for:
  • the regional word vector combination module is also used to: determine the scalar product between each combined regional word vector and each corresponding prediction vector; add each scalar product and the corresponding logarithm of the prior probability to obtain The logarithm of the prior probability corresponding to the newly determined word vectors in each region.
  • the aggregation transformation processing module is further used to: transform each combination region word vector based on the feedforward neural network to obtain a transformed combination region word vector; and concatenate each transformed combination region word vector, Obtain the spliced word vector; perform a linear transformation on the spliced word vector to obtain the aggregated word vector corresponding to the target word vector.
  • the aggregation transformation processing module is further configured to: perform linear transformation on each combination region word vector according to the fourth weight matrix and the first bias vector to obtain the temporary word vector corresponding to each combination region word vector; The maximum vector value in each temporary word vector and the vector threshold; linear transformation is performed on each maximum vector value according to the fifth weight matrix and the second bias vector to obtain the transformed combined region word vector.
  • the cluster word vector includes a first cluster word vector and a second cluster word vector, and the first cluster word vector is different from the second cluster word vector;
  • the target hidden state generation module is further used to: determine the candidate hidden state corresponding to the target word vector based on the first aggregate word vector and the corresponding first activation function; determine the target word vector based on the second aggregate word vector and the corresponding second activation function Corresponding gating parameters; generate the target hidden state corresponding to the target word vector according to the candidate hidden state, the gating parameter and the historical hidden state of the historical word vector at the previous moment.
  • Fig. 1 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may specifically be a terminal or a server.
  • the computer equipment includes a processor, a memory, and a network interface connected through a system bus. It can be understood that when the computer equipment is a terminal, the computer equipment may also include a display screen and an input device.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program.
  • the processor can realize the hidden state generation method in the recurrent neural network for language processing .
  • a computer program may also be stored in the internal memory.
  • the processor can execute the method for generating hidden states in the recurrent neural network for language processing.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button or track set on the housing of the computer equipment
  • the ball or touchpad can also be an external keyboard, touchpad or mouse.
  • FIG. 1 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the apparatus for generating hidden states in a recurrent neural network for language processing can be implemented in the form of a computer program, and the computer program can run on the computer device shown in FIG. 1.
  • the memory of the computer device can store various program modules that make up the hidden state generation device in the recurrent neural network for language processing, for example, the regional word vector generating module 1201, the regional word vector combining module 1202, and the aggregation shown in FIG.
  • the computer program composed of each program module causes the processor to execute the steps in the method for generating hidden states in the recurrent neural network for language processing in each embodiment of the application described in this specification.
  • the computer device shown in FIG. 1 can generate the target word vector input at the first moment through the regional word vector generation module 1201 in the hidden state generation device 1200 in the recurrent neural network for language processing as shown in FIG. 12
  • the computer device can perform the combination of various regional word vectors through the regional word vector combination module 1202 to obtain at least two-dimensional combined regional word vectors.
  • the computer device may perform aggregation transformation processing on the word vectors of each combined region based on the feedforward neural network through the aggregation transformation processing module 1203 to obtain the aggregation word vector corresponding to the target word vector.
  • the computer device may generate the target hidden state corresponding to the target word vector based on the aggregated word vector through the target hidden state generation module 1204.
  • a computer device including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the hidden state of the recurrent neural network for language processing. Steps to generate method.
  • the steps of the method for generating hidden states in the recurrent neural network for language processing may be the steps in the method for generating hidden states in the recurrent neural network for language processing in each of the above embodiments.
  • a computer-readable storage medium is provided, and a computer program is stored.
  • the processor executes the steps of the method for generating hidden states in the recurrent neural network for language processing.
  • the steps of the method for generating hidden states in the recurrent neural network for language processing may be the steps in the method for generating hidden states in the recurrent neural network for language processing in each of the above embodiments.
  • a person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through a computer program.
  • the program can be stored in a non-volatile computer readable storage medium. When the program is executed, it may include the procedures of the above-mentioned method embodiments.
  • any reference to memory, storage, database or other media used in the various embodiments provided in this application may include non-volatile and/or volatile memory.
  • Non-volatile memory can include read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), and electrically programmable read-only memory (Electrically Programmable Read-Only Memory, EPROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) or flash memory.
  • Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory.
  • RAM Random Access Memory
  • RAM random access memory
  • RAM Random Access Memory
  • RAM random access memory
  • RAM is available in many forms, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), and Synchronous Dynamic Random Access Memory (SRAM).
  • Synchronous Dynamic Random Access Memory SDRAM
  • Double Data Rate SDRAM Double Data Rate SDRAM
  • DDR SDRAM Double Data Rate SDRAM
  • Enhanced SDRAM Enhanced SDRAM
  • ESDRAM Synchronous Link DRAM
  • Bus DRAM Raster DRAM
  • RDRAM RDRAM
  • interface dynamic random access memory Direct Rambus DRAM, DRDRAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请涉及一种用于语言处理的循环神经网络中隐状态的生成方法和装置,方法包括:生成第一时刻输入的目标词向量的至少两个维度的区域词向量;将各个区域词向量进行组合,得到至少两个维度的组合区域词向量;基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量;基于聚集词向量生成目标词向量对应的目标隐状态。该方法使得每一时刻的目标词向量都有对应的聚集词向量,这样便可以在聚集词向量的基础上生成目标词向量对应的目标隐状态。由于聚集词向量是对目标词向量进行多维度的转换处理得到的,使得利用该聚集词向量生成的目标隐状态在捕获复杂语言学规律时,对于复杂语言学规律的捕获率高。

Description

用于语言处理的循环神经网络中隐状态的生成方法和装置
本申请要求于2019年04月17日提交的申请号为201910309929.5、发明名称为“用于语言处理的循环神经网络中隐状态的生成方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种用于语言处理的循环神经网络中隐状态的生成方法、装置、计算机可读存储介质和计算机设备。
背景技术
随着计算机技术的发展,出现了自然语言处理(Natural Language Processing,NLP)技术。自然语言处理(NLP)技术实现了人们长期以来所追求的“用自然语言与计算机进行通信”。但是,对于自然语言处理(NLP)技术而言,处理变长的词序列仍然是一个很大的挑战。
于是,出现了循环神经网络(Recurrent Neural Network,RNN)。循环神经网络是一类以序列(sequence)数据为输入,在序列的演进方向进行递归(recursion)且所有节点(循环单元)按链式连接的递归神经网络(recursive neural network)。循环神经网络的出现解决了对于变长的词序列的处理。
然而,传统的循环神经网络,都是利用单一空间的组合计算来生成各个时刻的隐状态。例如仅利用第一时刻的词向量和上一时刻的隐状态在单一空间上的组合来生成第一时刻的隐状态,使得对于复杂语言学规律的捕获率低。
发明内容
本申请提供了一种用于语言处理的循环神经网络中隐状态的生成方法、装置、计算机可读存储介质和计算机设备,该技术方案对于复杂语言学规律的捕获率高。该技术方案如下:
一方面,提供了一种用于语言处理的循环神经网络中隐状态的生成方法,该方法应用于计算机设备,该方法包括:
生成第一时刻输入的目标词向量的至少两个维度的区域词向量;
将各个所述区域词向量进行组合,得到至少两个维度的组合区域词向量;
基于前馈神经网络将各个所述组合区域词向量进行聚集变换处理,得到所 述目标词向量对应的聚集词向量;
基于所述聚集词向量生成所述目标词向量对应的目标隐状态。
另一方面,提供了一种用于语言处理的循环神经网络中隐状态的生成装置,该装置包括:
区域词向量生成模块,用于生成第一时刻输入的目标词向量的至少两个维度的区域词向量;
区域词向量组合模块,用于将各个所述区域词向量进行组合,得到至少两个维度的组合区域词向量;
聚集变换处理模块,用于基于前馈神经网络将各个所述组合区域词向量进行聚集变换处理,得到所述目标词向量对应的聚集词向量;
目标隐状态生成模块,用于基于所述聚集词向量生成所述目标词向量对应的目标隐状态。
另一方面,提供了一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,上述计算机程序被处理器执行时,使得处理器执行如上一个方面及其可选实施例任一所述的用于语言处理的循环神经网络中隐状态的生成方法。
另一方面,提供了一种计算机设备,上述计算机设备中包括存储器和处理器,上述存储器存储有计算机程序,上述计算机程序被上述处理器执行时,使得处理器执行如上一个方面及其可选实施例任一所述的用于语言处理的循环神经网络中隐状态的生成方法。
从以上技术方案可以看出,本申请实施例至少具有以下优点:
该方法通过生成第一时刻输入的目标词向量的至少两个维度的区域词向量,使得单一维度的目标词向量对应有多个维度的区域词向量,并将各个区域词向量进行区域组合,得到至少两个维度的组合区域词向量;再基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量,使得每一时刻的目标词向量都有对应的聚集词向量,这样便可以在聚集词向量的基础上生成目标词向量对应的目标隐状态,且由于聚集词向量是对目标词向量进行多维度的转换处理得到的,使得利用该聚集词向量生成的目标隐状态对于复杂语言学规律的捕获率高。
也就是说,该方法通过对目标词向量进行深层的多区域组合计算,得到在多维度转换后的聚集词向量,增强词向量中捕获的语言学规律,比如,增强词向量中的长距离依赖,从而使得利用聚集词向量生成的目标隐状态能够更大概率的捕获到复杂语言学规律。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请中一个示例性实施例提供的计算机设备的结构框图;
图2是本申请中一个示例性实施例提供的用于语言处理的循环神经网络中隐状态的生成方法的流程图;
图3是本申请中另一个示例性实施例提供的用于语言处理的循环神经网络中隐状态的生成方法的流程图;
图4是本申请中一个示例性实施例提供的向量节点的示例图;
图5是本申请中一个示例性实施例提供的邻接矩阵的示例图;
图6是本申请中一个示例性实施例提供的度矩阵的示例图;
图7是本申请中一个示例性实施例提供的区域词向量矩阵的示例图;
图8是本申请中另一个示例性实施例提供的用于语言处理的循环神经网络中隐状态的生成方法的流程图;
图9是本申请中一个示例性实施例提供的计算区域词向量的方法示例图;
图10是本申请中另一个示例性实施例提供的用于语言处理的循环神经网络中隐状态的生成方法的流程图;
图11是本申请中一个示例性实施例提供的生成聚集词向量的方法示例图;
图12是本申请中一个示例性实施例提供的用于语言处理的循环神经网络中隐状态的生成装置的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的用于语言处理的循环神经网络中隐状态的生成方法,可以应用于如图1所示的计算机设备100中。该计算机设备100包括存储器101和处理器102。可选地,存储器101可以包括非易失性存储介质与内存储器。存储器101中存储有计算机程序,计算机程序被处理器102执行时,可以实现本申请提供的用于语言处理的循环神经网络中隐状态的生成方法。可选地,该计算机设备100还包括网络接口103,该网络接口103用于将计算机设备100接入有线或 者无线网络。可选地,该计算机设备100还包括系统总线104,其中,存储器101分别与处理器102、网络接口103之间通过系统总线104电性连接。该计算机设备100可以是终端,也可以是服务器。可以理解的是,当计算机设备100为终端时,该计算机设备100还可以包括显示屏和输入装置等。其中,终端可以但不限于是各个种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
如图2所示,在一个实施例中,提供了一种用于语言处理的循环神经网络中隐状态的生成方法。本实施例主要以该方法应用于上述图1中的计算机设备来举例说明。参照图2,该用于语言处理的循环神经网络中隐状态的生成方法具体包括如下步骤:
S202,生成第一时刻输入的目标词向量的至少两个维度的区域词向量。
其中,词向量指对应的词在预定义的向量空间中的实数向量。例如,“狗”在预定义的向量空间中的实数向量可以为(0.2 0.2 0.4),则(0.2 0.2 0.4)便为“狗”的词向量。目标词向量指第一时刻输入的词向量。区域词向量指一个维度的词向量所对应的各个不同维度的词向量。第一时刻是输入目标词向量的时刻;示例性的,第一时刻可以包括当前时刻,即时钟当前所指示的时刻。
具体地,当计算机设备在第一时刻检测到有目标词向量输入时,计算机设备读取第一时刻输入的目标词向量,并触发该目标词向量的区域词向量生成指令。计算机设备根据该区域词向量生成指令将低维度的目标词向量转换为至少两个维度的区域词向量。这样计算机设备在各个时刻输入的目标词向量均对应有至少两个维度的区域词向量。
在一个实施例中,计算机设备在T个时刻输入的目标词向量整体为一个向量序列X={X 1,X 2,……,X T},其中,X 1为向量序列X中的第1个目标词向量,X 1表示计算机设备在第1个时刻输入的词向量;X 2为向量序列X中的第2个目标词向量,X 2表示计算机设备在第2个时刻输入的词向量;……;X T为向量序列X T中的第T个目标词向量,X T表示计算机设备在第T个时刻输入的词向量;其中,T为正整数。计算机设备在向量序列X中的每一个时刻均会生成该时刻输入的目标词向量的至少两个维度的区域词向量。
在一个实施例中,向量序列X={X 1,X 2,……,X T}中的每一个词向量都是预先对文本进行转换得到的。例如计算机设备为终端时,计算机设备中安装有用于通讯的社交应用程序,用于人机对话的子应用程序运行于该用于通讯的社交应用程序中。当计算机设备检测到用于人机对话的子应用程序接收到变长的语 音信息时,将每一时刻接收到的语音信息转换为文本信息,并将该文本信息映射为目标词向量,这样变长的语音信息最终会形成一个向量序列,向量序列中包括各个时刻接收到的语音信息的文本信息对应的目标词向量。
当然,当计算机设备为服务器时,该服务器可以接收其他终端已经转换得到的各个目标词向量,生成每一个时刻输入的目标词向量的至少两个维度的区域词向量。或者,当计算机设备为服务器时,该服务器还可以直接接收其他终端通过用于人机对话的子应用程序接收到的变长的语音信息,将每一时刻接收到的语音信息转换为文本信息,并将该文本信息映射为目标词向量,这样变长的语音信息最终会形成一个向量序列,向量序列中包括各个时刻接收到的语音信息的文本信息对应的目标词向量,并生成每一个时刻输入的目标词向量的至少两个维度的区域词向量。
在一个实施例中,区域词向量生成指令中可以携带有第一预设维度,计算机设备根据区域词向量生成指令将低维度的目标词向量转换为至少两个维度的区域词向量时,可以按照第一预设维度将低维度的目标词向量转换为第一预设维度的区域词向量。
例如,第一时刻为T,第一预设维度为N,计算机设备在第一时刻T检测到有目标词向量X T输入。则计算机设备需要将单一维度的目标词向量X T转换为N个维度的区域词向量。计算机设备可以生成目标词向量X T的N个区域词向量Z={Z 1,Z 2,……,Z N},其中的Z 1至Z N均为目标词向量X T的区域词向量。其中,N大于1。
S204,将各个区域词向量进行组合,得到至少两个维度的组合区域词向量。
其中,组合区域词向量指将各个区域词向量进行组合计算后所得到的词向量。例如目标词向量有N个维度的区域词向量,则计算机设备对N个维度的区域词向量进行组合计算后可以得到J个维度的组合区域词向量,J大于或等于2。
具体地,计算机设备预先设置有区域向量组合方式。当计算机设备生成目标词向量的区域词向量后,便获取预设的区域向量组合方式,区域向量组合方式中包括第二预设维度。计算机设备按照预设的区域向量组合方式对目标词向量的区域词向量进行组合计算,得到第二预设维度的组合区域词向量。区域向量组合方式指将各个区域词向量进行组合计算的方式。
S206,基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量。
其中,前馈神经网络是一种各个神经元分层排列的神经网络。聚集变换处理指将各个组合区域词向量进行聚集处理和变换处理的过程。聚集词向量指对各个组合区域词向量进行聚集处理和变换处理后所得到的词向量。
具体地,计算机设备可以基于前馈神经网络分别对各个组合区域词向量进行一次变换,得到与组合区域词向量的维度相同的中间区域词向量。例如目标词向量有J个维度的组合区域词向量O={O 1,O 2,……,O J},则计算机设备基于前馈神经网络先对各个组合区域词向量进行一次变换时,也可以得到J个维度的中间区域向量F={F 1,F 2,……,F J}。计算机设备将得到的各个中间区域词向量进行聚集处理,得到一个中间聚集词向量。计算机设备可以对得到的中间聚集词向量进行一次线性变换,便可以得到目标词向量对应的聚集词向量。
S208,基于聚集词向量生成目标词向量对应的目标隐状态。
其中,隐状态指循环神经网络的隐藏层输出的隐藏状态,隐藏状态指循环神经网络的系统状态(system status)。目标隐状态便指循环神经网络在第一时刻的系统状态(system status)。
具体地,计算机设备可以获取上一时刻的历史词向量的历史隐状态,计算机设备可以在历史隐状态的基础上加入目标词向量的聚集词向量对目标词向量的目标隐状态进行计算,生成目标词向量的目标隐状态。可以理解的是,上一时刻的历史词向量的历史隐状态也是基于历史词向量的聚集词向量生成的,而聚集词向量是对历史词向量进行多维度的转换处理得到的。
上述用于语言处理的循环神经网络中隐状态的生成方法,生成第一时刻输入的目标词向量的至少两个维度的区域词向量,使得单一维度的目标词向量对应有多个维度的区域词向量,并将各个区域词向量进行区域组合,得到至少两个维度的组合区域词向量。再基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量。
采用本申请提供的方案,使得每一时刻的目标词向量都有对应的聚集词向量,这样便可以在聚集词向量的基础上生成目标词向量对应的目标隐状态。由于聚集词向量是对目标词向量进行多维度的转换处理得到的,使得利用该聚集词向量生成的目标隐状态对于复杂语言学规律的捕获率高。例如在计算机设备处理手写识别、序列标注、情感分析、语言模型训练和机器翻译等任务中,即使遇到长距离依赖的语言结构,也可以高效地完成任务。
在一个实施例中,生成第一时刻输入的目标词向量的至少两个维度的区域词向量包括:获取至少两个第一权重矩阵,每个第一权重矩阵用于生成对应的区域词向量;确定第一时刻输入的目标词向量,并获取上一时刻的历史词向量对应的历史隐状态;基于第一权重矩阵和历史隐状态生成目标词向量的至少两个维度的区域词向量。
其中,第一权重矩阵指形式为矩阵的随着系统进行训练的权重参数,用于 生成对应的区域词向量。即第一权重矩阵是通过样本数对系统训练得到的矩阵形式的系统参数。历史词向量指计算机设备在第一时刻的上一时刻输入的词向量。历史隐状态指计算机设备在第一时刻的上一时刻输入的词向量对应的隐状态。
具体地,当计算机设备在第一时刻检测到有目标词向量输入时,计算机设备读取第一时刻输入的目标词向量,并触发该目标词向量的区域词向量生成指令。计算机设备根据该区域词向量生成指令获取用于生成区域词向量的第一权重矩阵,获取的第一权重矩阵的数量与计算机设备需要生成的区域词向量的维度的数量相同。
例如计算机设备需要生成的区域词向量的第一预设维度为N,则计算机设备获取的第一权重矩阵的数量为N。计算机设备在生成每一个维度的区域词向量时都有对应的第一权重矩阵:计算机设备在生成第一个维度的区域词向量Z 1时,有对应的第一权重矩阵W 1;计算机设备在生成第二个维度的区域词向量Z 2时,有对应的第一权重矩阵W 2;……;计算机设备在生成第N个维度的区域词向量Z N时,有对应的第一权重矩阵W N
计算机设备确定第一时刻输入的目标词向量,并获取计算机设备在第一时刻的上一时刻输入的历史词向量对应的历史隐状态。可以理解的是,上一时刻不必然是与第一时刻紧密相邻的时刻,上一时刻是计算机设备在当次输入目标词向量的前一次输入词向量时所对应的时刻。
例如计算机设备在T个时刻输入的目标词向量整体为一个向量序列X={X 1,X 2,……,X T},X 1表示计算机设备在第1个时刻输入的词向量,X 2表示计算机设备在第2个时刻输入的词向量。第1个时刻与第2个时刻之间可能会有很长的时间间隔,也可能只有很短的时间间隔,因此第1个时刻与第2个时刻间并不必然是在时间表上对应的紧密相邻的时刻。
计算机设备可以基于获取的历史隐状态和第一预设数量的第一权重矩阵生成第一预设维度的区域词向量。第一预设数量与第一预设维度的数量相同。第一预设维度的区域词向量整体可以为一个区域词向量矩阵,例如计算机设备需要将目标词向量X T转换为N个维度的区域词向量,则得到的N个维度的区域词向量可以表示为区域词向量矩阵
Figure PCTCN2020081177-appb-000001
区域词向量矩阵中的Z 1至Z N均为目标词向量X T的区域词向量。
上述实施例中,计算机设备直接利用用于生成对应的区域词向量的第一权 重矩阵,高效地将单一维度的目标词向量转换为至少两个维度的区域词向量。并且是在上一时刻的历史隐状态的基础上生成至少两个维度的区域词向量,使得得到的区域词向量更加准确。
在一个实施例中,基于第一权重矩阵和历史隐状态生成目标词向量的至少两个维度的区域词向量包括:将目标词向量与历史隐状态进行拼接,得到拼接词向量;根据拼接词向量和第一权重矩阵生成区域词向量矩阵;区域词向量矩阵包括至少两个维度的区域词向量。
具体地,计算机设备生成的各个时刻的隐状态的形式均为向量,因此,计算机设备在确定目标词向量并获取到上一时刻的历史词向量对应的历史隐状态后,可以将第一时刻的目标词向量与上一时刻的历史隐状态进行拼接,得到拼接词向量。例如目标词向量中包含8个向量元素,历史隐状态中包含5个向量元素,计算机设备直接将目标词向量与历史隐状态进行拼接后,得到的拼接词向量包含13个向量元素。计算机设备将得到的拼接词向量分别与各个第一权重矩阵相乘,便能得到区域向量矩阵。区域向量矩阵中包含多个维度的区域词向量。
在一个实施例中,将目标词向量与历史隐状态进行拼接可以表示为[X t,h t-1],其中,X t为计算机设备第一时刻输入的目标词向量,h t-1为第一时刻的上一时刻的历史词向量对应的历史隐状态。则计算机设备可以按照如下公式生成目标词向量的至少两个维度的区域词向量:Z i=W i[X t,h t-1]。
其中,W i表示第一权重矩阵。例如计算机设备需要生成N个区域词向量,则i为1至N,Z i为Z 1至Z N,W i为W 1至W N。可以理解的是,在计算Z 1时,Z i=W i[X t,h t-1]为Z 1=W 1[X t,h t-1];在计算Z 2时,Z i=W i[X t,h t-1]为Z 2=W 2[X t,h t-1];……;在计算Z N时,Z i=W i[X t,h t-1]为Z N=W N[X t,h t-1]。这样计算机设备便能得到区域词向量矩阵
Figure PCTCN2020081177-appb-000002
1至N分别表示对应的区域词向量Z 1至Z N所在的维度。其中,t为大于1的整数。
可以理解的是,区域词向量矩阵中的每一个区域词向量分别处于不同的维度,每一个区域词向量包含多个向量元素,每个向量元素均为所属区域词向量所在维度的矩阵元素。例如Z 1包含3个向量元素0.3、0.8和0.7,则0.3为Z 1所在第一维度的矩阵元素Z 11,0.8为Z 1所在第一维度的矩阵元素Z 12,0.7为Z 1所在第一维度的矩阵元素Z 13。以每个区域向量均包含3个向量元素为例, 则区域词向量矩阵具体可以表示为
Figure PCTCN2020081177-appb-000003
上述实施例中,计算机设备直接将目标词向量与上一时刻的隐状态进行拼接,得到拼接词向量,将拼接词向量与至少两个第一权重矩阵分别直接相乘,从而更加高效快捷地得到了至少两个维度的区域词向量。
在一个实施例中,如图3所示,将各个区域词向量进行组合,得到至少两个维度的组合区域词向量包括:
S302,确定各个区域词向量间的边权重。
其中,边权重指将各个区域词向量作为向量节点时,用于连接各个向量节点的边的权重。
具体地,计算机设备预设的区域向量组合方式可以为基于图卷积(graph convolutional networks)的区域向量组合方式,计算机设备按照基于图卷积的区域向量组合方式将各个区域词向量确定为向量节点,各个向量节点之间有连接的边,构建一个图G=(V,E),其中,V表示向量节点集合,E表示边集合。
如图4所示,例如计算机设备生成了目标词向量的3个维度的区域词向量:Z 1、Z 2和Z 3,则计算机设备将Z 1、Z 2和Z 3分别确定为向量节点401。各个向量节点之间连接的边402代表连接的两个向量节点间的关系。计算机设备可以计算各个向量节点之间的相似度,将各个向量节点之间的相似度确定为对应的各个向量节点之间的边的边权重。
在一个实施例中,计算机设备可以按照以下公式计算确定各个区域词向量间的边权重:W ij=(Z i TZ j)/(||Z i||*||Z j||),其中,Z i和Z j均为目标词向量的任意一个区域词向量。Z i T指区域词向量Z i的转置向量。“||Z i||”指区域词向量Z i的L2范数,“||Z j||”指区域词向量Z j的L2范数。这样,计算机设备按照上述公式可以得到各个区域词向量间的相似度,将各个向量节点之间的相似度确定为对应的各个向量节点之间的边的边权重。其中,j为正整数。
S304,按照确定的各个边权重生成各个区域词向量共同对应的邻接矩阵。
其中,邻接矩阵(Adjacency Matrix)是用于表示向量节点之间相邻关系的矩阵。
具体地,计算机设备可以将确定的各个边权重作为矩阵元素,形成一个邻接矩阵。例如计算机设备生成了目标词向量的N个维度的区域词向量,则计算机设备将N个区域词向量分别确定为向量节点,计算N个向量节点中各个向量节点之间的边权重。这样,计算机设备将确定的各个边权重作为矩阵元素所形成的邻接矩阵A可以如图5所示。
S306,分别将邻接矩阵中各个维度的各个边权重进行加和,得到度矩阵。
其中,度矩阵指由邻接矩阵各个行或各个列的向量节点的度形成的矩阵,各个行或各个列的向量节点的度为邻接矩阵中各个行或者各个列包含的矩阵元素的和。
具体地,如图5所示,邻接矩阵A中的每一行都包括某一个向量节点与其他向量节点之间的边的边权重。例如图5中的W 12可以表示各个向量节点中的第1个节点与第2个节点之间的边的边权重。计算机设备得到邻接矩阵后,便可以将邻接矩阵中各个行所包括的边权重进行加和,得到各个行对应的向量节点的度。例如邻接矩阵中的第一行包括的是向量节点Z 1与其他向量节点之间的边权重:W 11,W 12,W 13,……,W 1N,则计算机设备将W 11至W 1N进行加和,便能得到向量节点Z 1的度D 11
进一步地,计算机设备计算可以按照以下公式计算各个行对应的向量节点的度:D ii=∑ jW ij,其中,W ij指邻接矩阵中第i行第j列的矩阵参数(该矩阵参数为向量节点中的第i个向量节点与第j个向量节点间的边权重)。例如计算图4中邻接矩阵A的第一行表示的向量节点的度时,D 11=∑ jW ij中的j为1至N,则D 11=W 11+W 12+W 13+……+W 1N
计算机设备得到的邻接矩阵中各个行的向量节点的度可以表示为:D 11,D 22,D 33,……,D NN,计算机设备基于“D 11,D 22,D 33,……,D NN”便可以形成度矩阵D,形成的度矩阵D中D 11,D 22,D 33,……,D NN之外的其他矩阵元素均为0,如图6所示。
S308,基于邻接矩阵和度矩阵生成至少两个维度的组合区域词向量。
具体地,计算机设备可以基于得到的邻接矩阵和度矩阵生成第二预设维度的组合区域词向量,本实施例中的第二预设维度的数量与目标词向量的区域词向量的数量相同。
上述实施例中,计算机设备将目标词向量的各个区域词向量作为图卷积网络中的图结构的向量节点,可以计算出各个向量节点之间的边权重,便得到了各个区域词向量间的边权重,利用得到的边权重生成邻接矩阵,并基于邻接矩阵计算出度矩阵。这样计算机设备可以直接利用邻接矩阵和度矩阵高效地生成组合区域词向量。
在一个实施例中,基于邻接矩阵和度矩阵生成至少两个维度的组合区域词向量包括:确定各个区域词向量共同对应的区域词向量矩阵;获取用于生成组合区域词向量矩阵的第二权重矩阵;根据邻接矩阵、度矩阵、区域词向量矩阵和第二权重矩阵生成组合区域词向量矩阵;组合区域词向量矩阵中包括至少两 个维度的区域词向量。
其中,区域词向量矩阵指由各个区域词向量包含的向量元素作为矩阵元素的矩阵。第二权重矩阵指图卷积网络中的形式为矩阵的随着系统进行训练的权重参数,用于生成组合区域词向量矩阵。即第二权重矩阵是通过样本数据对系统进行训练得到的系统参数。
具体地,计算机设备将各个区域词向量包含的向量元素作为矩阵元素,形成一个区域词向量矩阵。例如计算机设备生成了目标词向量X T的N个区域词向量Z={Z 1,Z 2,……,Z N},每个区域向量中包含M个向量元素,计算机设备将N个区域词向量Z中各个区域词向量所包含的向量元素作为矩阵元素,形成区域词向量矩阵Z,区域词向量矩阵Z如图7中的700所示。其中,M为整数。
计算机设备获取用于生成组合区域词向量矩阵的第二权重矩阵Wg,按照以下公式生成组合区域词向量矩阵O:O=σ(D -1/2AD -1/2ZW g)。其中,D指度矩阵,A指邻接矩阵,Z指区域词向量矩阵,σ为激活函数。进一步地,激活函数σ具体可以是sigmoid函数“sigmoid(x)”。sigmoid函数是一个在生物学中常见的S型函数,也称为S型生长曲线,本实施例中,sigmoid函数作为循环神经网络中的阈值函数。
计算机设备利用激活函数σ可以得到与目标词向量的区域词向量维度相同的组合区域词向量矩阵O,组合区域词向量矩阵O的每一行作为一个维度,每一个维度具有一个组合区域词向量。例如N个区域词向量对应的组合区域词向量矩阵
Figure PCTCN2020081177-appb-000004
包含J个组合区域词向量,J与N的大小相同。则计算机设备可以得到J个组合区域词向量O={O 1,O 2,……,O J}。
上述实施例中,将各个区域词向量整体作为一个区域词向量矩阵,并利用用于生成组合区域词向量矩阵的第二权重矩阵,基于邻接矩阵和度矩阵生成区域词向量矩阵对应的组合区域词向量矩阵,并且生成的组合区域词向量矩阵包括至少两个维度的区域词向量,进一步提高了生成组合区域词向量的高效性。
在一个实施例中,如图8所示,将各个区域词向量进行组合,得到至少两个维度的组合区域词向量包括:
S802,根据用于生成组合区域词向量的第三权重矩阵确定各个区域词向量对应的至少两个预测向量。
其中,第三权重矩阵指胶囊网络(capsule networks)中的形式为矩阵的随着系统进行训练的权重参数,用于生成组合区域词向量矩阵。即第三权重矩阵是通过样本数据对系统进行训练得到的系统参数。预测向量指生成组合区域词向量矩阵的过程中的形式为向量的中间变量。
具体地,计算机设备预设的区域向量组合方式可以为基于胶囊网络的区域向量组合方式,计算机设备按照基于胶囊网络的区域向量组合方式将各个区域词向量作为胶囊网络中的胶囊,例如计算机设备生成了目标词向量的N个区域词向量Z={Z 1,Z 2,……,Z N},便有N个胶囊:Z 1,Z 2,……,Z N
计算机设备获取用于生成组合区域词向量的第三权重矩阵W C中的各个矩阵元素W C ij,其中,i为1至N,N为胶囊的总数量,j为1至第二预设维度的数量,本实施例中第二预设维度的数量大于或等于2且小于或等于N,ij表示第三权重矩阵W C的第i行第j列。
如图9所示,图9中的901-904为计算机设备按照基于胶囊网络的区域向量组合方式对目标词向量的区域词向量进行组合计算的初始化阶段,905-910为计算机设备按照基于胶囊网络的区域向量组合方式对目标词向量的区域词向量进行组合计算的迭代计算阶段。在初始化阶段,如图9中的903所示,计算机设备可以基于胶囊网络中的各个胶囊,生成获取的第三权重矩阵中的各个矩阵元素W C ij对应的预测向量Z j|i
S804,确定各个区域词向量对应的至少两个先验概率对数。
其中,先验概率对数指生成组合区域词向量矩阵的过程中的形式为向量的临时变量。
具体地,计算机设备从先验概率对数矩阵B中获取各个先验概率对数b ij,先验概率对数矩阵B中包括的先验概率对数b ij的数量为胶囊的总数量*第二预设维度的数量。如图9中的902所示,由于此时处于初始化阶段,先验概率对数矩阵B中的所有先验概率对数b ij均为0。
S806,根据先验概率对数确定各个区域词向量对应的耦合系数。
具体地,计算机设备进入迭代计算阶段。在迭代计算阶段,如图9中的905所示,计算机设备对获取的各个先验概率对数b ij进行归一化处理,公式如下:
Figure PCTCN2020081177-appb-000005
得到各个区域词向量分别与对应的各个待生成的组合区域词向量间的耦合系数C ij。其中,exp()是指以e为底数的指数函数。
S808,基于耦合系数和预测向量生成至少两个维度的候选组合区域词向量。
具体地,计算机设备得到耦合系数Cij后,如图9中的906所示,按照以下 公式计算加权和S j:S j=∑ iC ijZ j|i。其中,∑是求和符号。如图9中的907所示,计算机设备通过非线性激活函数squash(Sj)生成第二预设维度的组合区域词向量O j。其中,
Figure PCTCN2020081177-appb-000006
其中,“||S j||”是指计算S j的范数。
S810,重复执行以上步骤S804至步骤S808对候选组合区域词向量进行迭代计算,直至符合预设迭代条件时停止迭代,将停止迭代时的至少两个维度的候选组合区域词向量确定为至少两个维度的组合区域词向量。
也就是说,计算机设备重复执行以上步骤S804至步骤S808这三个步骤,以对候选组合区域词向量进行迭代计算,直至符合预设迭代条件时停止迭代,将停止迭代时的至少两个维度的候选组合区域词向量确定为至少两个维度的组合区域词向量。
需要说明的是,对候选组合区域词向量进行迭代计算时,需要重新确定区域词向量与组合区域词向量之间的先验概率对数。具体地,如图9中的步骤908所示,计算机设备得到组合区域词向量Oj后,执行步骤909,按照以下公式重新确定各个区域词向量分别与各个组合区域词向量之间的先验概率对数:b ij=b ij+Z j|iO j
具体地,重新确定先验概率对数b ij后,返回图9中905的步骤,直到符合预设迭代条件时停止迭代,输出最后一次生成的各个组合区域词向量。示例性的,预设迭代条件可以是预设迭代次数,例如,预设迭代次数为3次,则计算机设备检测到当前迭代次数已达到预设迭代次数时,停止迭代,输出第3次生成的各个组合区域词向量。
例如,预设迭代次数为5次,则对上述步骤S804至步骤S808这三个步骤重复执行5次,在第5次执行步骤S804至步骤S808之后,停止再次执行,并将第5次执行步骤S804至步骤S808之后得到的候选组合区域词向量作为至少两个维度的组合区域词向量。
上述实施例中,计算机设备将目标词向量的各个区域词向量作为胶囊网络中的胶囊,利用胶囊网络中用于生成组合区域词向量的第三权重矩阵生成各个区域词向量对应的至少两个预测向量,并获取各个区域词向量对应的初始化的至少两个先验概率对数。在基于先验概率对数生成至少两个维度的组合区域词向量的过程中,利用胶囊网络中对于先验概率对数的迭代算法更加高效准确地生成最终的组合区域词向量。
即在基于先验概率对数生成至少两个维度的组合区域词向量的过程中,利用胶囊网络中对于先验概率对数的迭代算法,高效地对组合区间词向量进行多 次的迭代计算,同时通过多次迭代更好的捕获复杂语言学规律。
在一个实施例中,确定各个所述区域词向量对应的至少两个先验概率对数,还包括:确定各个组合区域词向量与对应的各个预测向量间的标量积;将各个标量积与对应的先验概率对数进行加和,得到重新确定的各个区域词向量对应的先验概率对数。
具体地,如图9中的步骤908所示“Z j|i·O j”指的就是预测向量Z j|i与组合区域词向量O j之间的标量积,再将得到的标量积分别与当前的各个先验概率对数进行加和,重新得到多个先验概率对数。
例如,预测向量Z 1|1=(a 1,a 2,……,a n),当前得到的组合区域词向量O 1=(c 1,c 2,……,c n),相应地,标量积Z 1|1·O 1=a 1c 1+a 2c 2+……+a nc n;将当前的b 11与Z 1|1·O 1进行加和,得到新的先验概率对数b 11=b 11+Z 1|1·O 1
上述实施例中,计算机设备将各个组合区域词向量与对应的各个预测向量间的标量积与当前的先验概率对数进行加和,得到多个重新确定的先验概率对数,经过多次迭代后先验概率对数的准确率更高,这样便可以更加高效准确地生成最终的组合区域词向量。
在一个实施例中,基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量,包括:基于前馈神经网络对各个组合区域词向量进行变换,得到变换后的组合区域词向量;将各个变换后的组合区域词向量进行拼接,得到拼接后的词向量;对拼接后的词向量进行线性变换,得到目标词向量对应的聚集词向量。
具体地,计算机设备按照基于前馈神经网络的预设公式对各个组合区域词向量O={O 1,O 2,……,O J}进行更深层的变换,得到变换后的组合区域词向量F=={f 1,f 2,……,f J}。计算机设备将F中的所有变换后的组合区域词向量进行拼接,得到一个拼接后的词向量(f 1 f 2 …… f J)。再对拼接后的词向量(f 1 f 2 …… f J)进行一次线性变换,得到目标词向量对应的聚集词向量。
上述实施例中,计算机设备基于前馈神经网络对各个组合区域词向量进行了更深层次的变换,得到聚集词向量,使得利用基于聚集词向量生成的目标隐状态捕获复杂语言学规律时,对于复杂语言学规律的捕获率高。
在一个实施例中,基于前馈神经网络对各个组合区域词向量进行变换,得到变换后的组合区域词向量,包括:根据第四权重矩阵和第一偏置向量对各个组合区域词向量进行线性变换,得到各个组合区域词向量对应的临时词向量; 分别选取各个临时词向量与向量阈值中的最大向量值;根据第五权重矩阵和第二偏置向量对各个最大向量值分别进行线性变换,得到变换后的组合区域词向量。
其中,第四权重矩阵指前馈神经网络中的形式为矩阵的随着系统进行训练的权重参数,用于在前馈神经网络中对各个组合区域向量进行第一次的线性变换。第五权重矩阵指前馈神经网络中的形式为矩阵的随着系统进行训练的权重参数,用于在前馈神经网络中对各个组合区域向量进行第二次的线性变换。第一偏置向量指前馈神经网络中的形式为向量的随着系统进行训练的偏置参数,用于在前馈神经网络中对各个组合区域向量进行第一次的线性变换。第二偏置向量指前馈神经网络中的形式为向量的随着系统进行训练的偏置参数,用于在前馈神经网络中对各个组合区域向量进行第二次的线性变换。其中,第四权重矩阵与第五权重矩阵是通过样本数对系统训练得到的矩阵形式的系统参数。
具体地,计算机设备获取前馈神经网络中的第四权重矩阵W 1和第一偏置向量b 1,利用第四权重矩阵W 1和第一偏置向量b 1对各个组合区域词向量O j进行第一次线性变换:O jW 1+b 1,得到各个组合区域词向量对应的临时词向量。将各个临时词变量分别与向量阈值做比较,选取各个临时词变量与向量阈值间的最大向量值。
例如,向量阈值为0,则计算机设备将各个临时词变量分别与向量阈值0做比较,通过Relu函数“max(0,X)”选取最大向量值max(0,O jW 1+b 1),将大于向量阈值0的临时词变量作为该临时词变量与向量阈值0中的最大向量值,将大于临时词变量的向量阈值0作为该临时词变量与向量阈值0中的最大向量值。
计算机设备获取前馈神经网络中的第五权重矩阵W 2和第二偏置向量b 2,利用第五权重矩阵W 2、以及第二偏置向量b 2对各个组合区域词向量O j进行第二次线性变换,得到二次线性变换后的组合区域向量f J:f J=max(0,O jW 1+b 1)W 2+b 2,进而得到变换后的组合区域词向量F={f 1,f 2,……,f J}。
上述实施例中,计算机设备利用前馈神经网络中的第四权重矩阵和第一偏置向量对各个组合区域词向量进行了第一次线性变换后,得到临时词向量,并选取临时词向量与向量阈值中的最大向量值,利用前馈神经网络中的第五权重矩阵和第二偏置向量对最大向量值进行第二次线性变换,得到的变换后的组合区域词向量。计算机设备可以利用该组合区域词向量生成聚集词向量,使得利用基于聚集词向量生成的目标隐状态捕获复杂语言学规律时,对于复杂语言学规律的捕获率高。
在一个实施例中,聚集词向量包括第一聚集词向量和第二聚集词向量,第 一聚集词向量与第二聚集词向量不同;基于聚集词向量生成目标词向量对应的目标隐状态包括:基于第一聚集词向量和对应的第一激活函数确定目标词向量对应的候选隐状态;基于第二聚集词向量和对应的第二激活函数确定目标词向量对应的门控参数;根据候选隐状态、门控参数和上一时刻的历史词向量的历史隐状态生成目标词向量对应的目标隐状态。
具体地,计算机设备在生成目标词向量的区域词向量时,要分别基于第一聚集词向量对应的第一权重矩阵和第二聚集词向量对应的第一权重矩阵生成聚集词向量。当计算机设备基于第一聚集词向量对应的第一权重矩阵生成区域词向量时,计算机设备最后得到的目标词向量对应的聚集词向量为第一聚集词向量M h。当计算机设备基于第二聚集词向量对应的第一权重矩阵生成区域词向量时,计算机设备最后得到的目标词向量对应的聚集词向量为第二聚集词向量M g
计算机设备通过第一激活函数tanh确定目标词向量的候选隐状态h~ t:h~ t=tanh(M h)。计算机设备通过第二激活函数σ确定目标词向量的门控参数g t:g t=σ(M g)。
进一步地,
Figure PCTCN2020081177-appb-000007
Figure PCTCN2020081177-appb-000008
计算机设备得到目标词向量对应的候选隐状态h~ t和门控参数g t后,按照以下公式计算目标词向量的目标隐状态h t:h t=(1-g t)⊙h t-1+g t⊙h~ t。其中,⊙是元素积运算符,“(1-g t)⊙h t-1”指对(1-g t)和h t-1进行元素积的运算,“g t⊙h~ t”指对g t和h~ t进行元素积的运算。
上述实施例中,由于第一聚集词向量和第二聚集词向量都是对目标词向量进行多维度的转换处理得到的,这样基于第一聚集词向量得到的候选隐状态和基于第二聚集词向量得到的门控参数更加的精确,这样利用基于更加精确的候选隐状态和门控参数得到的目标隐状态捕获复杂语言学规律时,对于复杂语言学规律的捕获率高。
在一个实施例中,如图10所示,用于语言处理的循环神经网络中隐状态的生成方法可以包括以下步骤:
S1002,计算机设备按照区域词向量生成公式生成第一预设维度的区域词向量。
具体地,如图11中的1101所示,区域词向量生成公式为:Z i=W i[X t,h t-1]。例如第一预设维度为N,则i为1至N,计算机设备得到的N个维度的区域词向量可以表示为Z={Z 1,Z 2,……,Z N}。
S1004,计算机设备按照预设的区域向量组合方式对第一预设维度的区域词向量进行组合计算,得到第二预设维度的组合区域词向量。
具体地,如图11中的1102所示,计算机设备对得到的N个维度的区域词向量Z={Z 1,Z 2,……,Z N}进行组合计算,例如第二预设维度为J,则计算机设备可以得到J个组合区域词向量O={O 1,O 2,……,O J}。J可以等于N,也可以不等于N。例如当预设的区域向量组合方式为基于图卷积的区域向量组合方式时,第二预设维度J等于第一预设维度N。当预设的区域向量组合方式为基于胶囊网络的区域向量组合方式时,第二预设维度J大于或等于2,且小于或等于第一预设维度N。
S1006,基于前馈神经网络对各个组合区域词向量进行深层变换,得到第二预设维度的中间区域词向量。
具体地,如图11中的1103所示,计算机设备通过前馈神经网络(Feedforward Neural Network,FNN)对各个组合区域词向量进行处理时,具体可以按照以下公式生成各个中间区域词向量f J:f J=max(0,O jW 1+b 1)W 2+b 2。例如第二预设维度为J,则计算机设备可以生成J个中间区域词向量F={f 1,f 2,……,f J}。
S1008,计算机设备将第二预设维度的中间区域词向量进行拼接,得到拼接词向量,并对拼接词向量进行一次线性变换,得到聚集词向量。
具体地,如图11中的1103所示,“Concat&Linear”便指计算机设备将J个中间区域词向量F={f 1,f 2,……,f J}进行拼接(Concat)后,再进行一次线性变换(Linear)。
S1010,计算机设备基于聚集词向量生成目标词向量对应的目标隐状态。
具体地,聚集词向量分为第一聚集词向量M h和第二聚集词向量M g。计算机设备可以基于第一聚集词向量M h和第二聚集词向量M g计算候选隐状态h~ t和门控参数g t:候选隐状态h~ t=tanh(M h),门控参数g t=σ(M g)。这样,计算机设备便可以基于候选隐状态h~ t和门控参数g t计算目标词向量的目标隐状态h t:目标隐状态h t=(1-g t)⊙h t-1+g t⊙h~ t
在一个实施例中,例如在人机对话的应用场景中,计算机设备接收到变长的语音信息,计算机设备将每一时刻接收到的语音信息转换为文本信息,并将该文本信息映射为目标词向量,通过上述任一实施例中用于语言处理的循环神经网络中隐状态的生成方法中的步骤,生成各个目标词向量的目标隐状态。
计算机设备可以计算生成的多个目标隐状态的平均隐状态,将该平均隐状 态作为h t-1,X t为0向量。基于h t-1和X t计算第一聚集词向量M h和第二聚集词向量M g。计算机设备基于第一聚集词向量M h和第二聚集词向量M g计算候选隐状态h~ t和门控参数g t:候选隐状态h~ t=tanh(M h),门控参数g t=σ(M g),并按照公式h t=(1-g t)⊙h t-1+g t⊙h~ t,得到中间隐状态h t。例如中间隐状态h t为一个包含100个向量元素的向量,则可以用中间隐状态h t与包含100*Y的权重矩阵W v相乘,得到包含Y个向量元素的中间向量。通过softmax(中间向量),可以得到Y个概率值,每一个概率值代表对应单词表中的一个词的概率。例如Y为10000,则计算机设备可以得到10000个概率值。
计算机设备将Y个概率值中的最大概率值对应的词作为当前人机对话计算机设备需要做出答复的第一个词。计算机设备将计算机设备需要做出答复的第一个词的词向量作为X t,将中间隐状态h t作为h t-1,继续执行基于h t-1和X t计算第一聚集词向量M h和第二聚集词向量M g的步骤,按照同样的计算步骤,计算机设备可以得到需要做出答复的第二个词、第三个词、第四个词……。直到得到的最大概率值符合结束条件,则结束迭代。进一步地,结束条件可以为最大概率值对应的词为指定的结束符号。
图2、3、8和10为一个实施例中用于语言处理的循环神经网络中隐状态的生成方法的流程示意图。应该理解的是,虽然图2、3、8和10的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2、3、8和10中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图12所示,提供了一种用于语言处理的循环神经网络中隐状态的生成装置1200,该装置可以通过软件、硬件、或者二者结合实现成为计算机设备的部分或者全部;该装置包括区域词向量生成模块1201、区域词向量组合模块1202、聚集变换处理模块1203和目标隐状态生成模块1204,其中:
区域词向量生成模块1201,用于生成第一时刻输入的目标词向量的至少两个维度的区域词向量。
区域词向量组合模块1202,用于将各个区域词向量进行组合,得到至少两个维度的组合区域词向量。
聚集变换处理模块1203,用于基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量。
目标隐状态生成模块1204,用于基于聚集词向量生成目标词向量对应的目标隐状态。
在一个实施例中,区域词向量生成模块,还用于:获取至少两个第一权重矩阵,每个第一权重矩阵用于生成对应的区域词向量;确定第一时刻输入的目标词向量,并获取上一时刻的历史词向量对应的历史隐状态;基于第一权重矩阵和历史隐状态生成目标词向量的至少两个维度的区域词向量。
在一个实施例中,区域词向量生成模块,还用于:将目标词向量与历史隐状态进行拼接,得到拼接词向量;根据拼接词向量和第一权重矩阵生成区域词向量矩阵;区域词向量矩阵包括至少两个维度的区域词向量。
在一个实施例中,区域词向量组合模块,还用于:确定各个区域词向量间的边权重;按照确定的各个边权重生成各个区域词向量共同对应的邻接矩阵;分别将邻接矩阵中各个维度的各个边权重进行加和,得到度矩阵;基于邻接矩阵和度矩阵生成至少两个维度的组合区域词向量。
在一个实施例中,区域词向量组合模块,还用于:确定各个区域词向量共同对应的区域词向量矩阵;获取用于生成组合区域词向量矩阵的第二权重矩阵;根据邻接矩阵、度矩阵、区域词向量矩阵和第二权重矩阵生成组合区域词向量矩阵;组合区域词向量矩阵中包括至少两个维度的区域词向量。
在一个实施例中,区域词向量组合模块,还用于:
根据用于生成组合区域词向量的第三权重矩阵确定各个区域词向量对应的至少两个预测向量;
确定各个区域词向量对应的至少两个先验概率对数;根据先验概率对数确定各个区域词向量对应的耦合系数;基于耦合系数和预测向量生成至少两个维度的候选组合区域词向量;
再次从所述确定各个区域词向量对应的至少两个先验概率对数的步骤开始执行,对候选组合区域词向量进行迭代计算,直至符合预设迭代条件时停止迭代,将停止迭代时的至少两个维度的候选组合区域词向量确定为至少两个维度的组合区域词向量。
在一个实施例中,区域词向量组合模块,还用于:确定各个组合区域词向量与对应的各个预测向量间的标量积;将各个标量积与对应的先验概率对数进行加和,得到重新确定的各个区域词向量对应的先验概率对数。
在一个实施例中,聚集变换处理模块,还用于:基于前馈神经网络对各个组合区域词向量进行变换,得到变换后的组合区域词向量;将各个变换后的组 合区域词向量进行拼接,得到拼接后的词向量;对拼接后的词向量进行线性变换,得到目标词向量对应的聚集词向量。
在一个实施例中,聚集变换处理模块,还用于:根据第四权重矩阵和第一偏置向量对各个组合区域词向量进行线性变换,得到各个组合区域词向量对应的临时词向量;分别选取各个临时词向量与向量阈值中的最大向量值;根据第五权重矩阵和第二偏置向量对各个最大向量值分别进行线性变换,得到变换后的组合区域词向量。
在一个实施例中,聚集词向量包括第一聚集词向量和第二聚集词向量,第一聚集词向量与第二聚集词向量不同;
目标隐状态生成模块,还用于:基于第一聚集词向量和对应的第一激活函数确定目标词向量对应的候选隐状态;基于第二聚集词向量和对应的第二激活函数确定目标词向量对应的门控参数;根据候选隐状态、门控参数和上一时刻的历史词向量的历史隐状态生成目标词向量对应的目标隐状态。
图1示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是终端或服务器。如图1所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。可以理解的是,当计算机设备为终端时,该计算机设备还可以包括显示屏和输入装置等。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现用于语言处理的循环神经网络中隐状态的生成方法。
该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行用于语言处理的循环神经网络中隐状态的生成方法。当计算机设备为终端时,计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的用于语言处理的循环神经网络中隐状态的生成装置可以实现为一种计算机程序的形式,计算机程序可在如图1所示的计算机设备上运行。计算机设备的存储器中可存储组成该用于语言处理的循环神经网络中隐状态的生成装置的各个程序模块,比如,图12所示的区域词向量生成模块1201、区域词向量组合模块1202、聚集变换处理模块1203和目标隐状 态生成模块1204。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的用于语言处理的循环神经网络中隐状态的生成方法中的步骤。
例如,图1所示的计算机设备可以通过如图12所示的用于语言处理的循环神经网络中隐状态的生成装置1200中的区域词向量生成模块1201执行生成第一时刻输入的目标词向量的至少两个维度的区域词向量。计算机设备可通过区域词向量组合模块1202执行将各个区域词向量进行组合,得到至少两个维度的组合区域词向量。计算机设备可通过聚集变换处理模块1203执行基于前馈神经网络将各个组合区域词向量进行聚集变换处理,得到目标词向量对应的聚集词向量。计算机设备可通过目标隐状态生成模块1204执行基于聚集词向量生成目标词向量对应的目标隐状态。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述用于语言处理的循环神经网络中隐状态的生成方法的步骤。此处用于语言处理的循环神经网络中隐状态的生成方法的步骤可以是上述各个实施例的用于语言处理的循环神经网络中隐状态的生成方法中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述用于语言处理的循环神经网络中隐状态的生成方法的步骤。此处用于语言处理的循环神经网络中隐状态的生成方法的步骤可以是上述各个实施例的用于语言处理的循环神经网络中隐状态的生成方法中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各个方法的实施例的流程。其中,本申请所提供的各个实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、电可编程只读存储器(Electrically Programmable Read-Only Memory,EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)或闪存。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态随机存取存储器(Static Random Access Memory,SRAM)、动态随机存取存储器(Dynamic Random Access Memory,DRAM)、同步动态随机存取存储器 (Synchronous Dynamic Random Access Memory,SDRAM)、双数据率SDRAM(Double Data Rate SDRAM,DDR SDRAM)、增强型SDRAM(Enhanced SDRAM,ESDRAM)、同步链路DRAM(SynchLink DRAM,SLDRAM)、总线式DRAM(Rambus DRAM,RDRAM)、以及接口动态随机存储器(Direct Rambus DRAM,DRDRAM)等。
以上实施例的各个技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施按照预设的,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (22)

  1. 一种用于语言处理的循环神经网络中隐状态的生成方法,其中,应用于计算机设备中,所述方法包括:
    生成第一时刻输入的目标词向量的至少两个维度的区域词向量;
    将各个所述区域词向量进行组合,得到至少两个维度的组合区域词向量;
    基于前馈神经网络将各个所述组合区域词向量进行聚集变换处理,得到所述目标词向量对应的聚集词向量;
    基于所述聚集词向量生成所述目标词向量对应的目标隐状态。
  2. 根据权利要求1所述的方法,其中,所述生成第一时刻输入的目标词向量的至少两个维度的区域词向量,包括:
    获取至少两个第一权重矩阵,每个所述第一权重矩阵用于生成对应的区域词向量;
    确定第一时刻输入的目标词向量,并获取上一时刻的历史词向量对应的历史隐状态;
    基于所述第一权重矩阵和所述历史隐状态生成所述目标词向量的至少两个维度的区域词向量。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一权重矩阵和所述历史隐状态生成所述目标词向量的至少两个维度的区域词向量,包括:
    将所述目标词向量与所述历史隐状态进行拼接,得到拼接词向量;
    根据所述拼接词向量和所述第一权重矩阵生成区域词向量矩阵;所述区域词向量矩阵包括所述至少两个维度的区域词向量。
  4. 根据权利要求1至3任一项所述的方法,其中,所述将各个所述区域词向量进行组合,得到至少两个维度的组合区域词向量,包括:
    确定各个所述区域词向量间的边权重;
    按照确定的各个所述边权重生成各个所述区域词向量共同对应的邻接矩阵;
    分别将所述邻接矩阵中各个维度的各个所述边权重进行加和,得到度矩阵;
    基于所述邻接矩阵和所述度矩阵生成所述至少两个维度的组合区域词向量。
  5. 根据权利要求4所述的方法,其中,所述基于所述邻接矩阵和所述度矩阵生成至少两个维度的组合区域词向量,包括:
    确定各个所述区域词向量共同对应的区域词向量矩阵;
    获取用于生成组合区域词向量矩阵的第二权重矩阵;
    根据所述邻接矩阵、所述度矩阵、所述区域词向量矩阵和所述第二权重矩阵生成所述组合区域词向量矩阵;所述组合区域词向量矩阵中包括所述至少两个维度的区域词向量。
  6. 根据权利要求1至3任一项所述的方法,其中,所述将各个所述区域词向量进行组合,得到至少两个维度的组合区域词向量,包括:
    根据用于生成组合区域词向量的第三权重矩阵确定各个所述区域词向量对应的至少两个预测向量;
    确定各个所述区域词向量对应的至少两个先验概率对数;根据所述先验概率对数确定各个所述区域词向量对应的耦合系数;基于所述耦合系数和所述预测向量生成至少两个维度的候选组合区域词向量;
    再次从所述确定各个所述区域词向量对应的至少两个先验概率对数的步骤开始执行,对所述候选组合区域词向量进行迭代计算,直至符合预设迭代条件时停止迭代,将停止迭代时的至少两个维度的候选组合区域词向量确定为所述至少两个维度的组合区域词向量。
  7. 根据权利要求6所述的方法,其中,所述确定各个所述区域词向量对应的至少两个先验概率对数,还包括:
    确定各个所述候选组合区域词向量与对应的各个所述预测向量间的标量积;
    将各个所述标量积与对应的所述先验概率对数进行加和,得到重新确定的各个所述区域词向量对应的先验概率对数。
  8. 根据权利要求1至3任一项所述的方法,其中,所述基于前馈神经网络将各个所述组合区域词向量进行聚集变换处理,得到所述目标词向量对应的聚集词向量,包括:
    基于前馈神经网络对各个所述组合区域词向量进行变换,得到变换后的组合区域词向量;
    将各个变换后的组合区域词向量进行拼接,得到拼接后的词向量;
    对拼接后的词向量进行线性变换,得到所述目标词向量对应的聚集词向量。
  9. 根据权利要求8所述的方法,其中,所述基于前馈神经网络对各个所述组合区域词向量进行变换,得到变换后的组合区域词向量,包括:
    根据第四权重矩阵和第一偏置向量对各个所述组合区域词向量进行线性变换,得到各个组合区域词向量对应的临时词向量;
    分别选取各个所述临时词向量与向量阈值中的最大向量值;
    根据第五权重矩阵和第二偏置向量对各个所述最大向量值分别进行线性变换,得到所述变换后的组合区域词向量。
  10. 根据权利要求1至3任一项所述的方法,其中,所述聚集词向量包括第一聚集词向量和第二聚集词向量,所述第一聚集词向量与所述第二聚集词向量不同;
    所述基于所述聚集词向量生成所述目标词向量对应的目标隐状态,包括:
    基于所述第一聚集词向量和对应的第一激活函数确定所述目标词向量对应的候选隐状态;
    基于所述第二聚集词向量和对应的第二激活函数确定所述目标词向量对应的门控参数;
    根据所述候选隐状态、所述门控参数和上一时刻的历史词向量的历史隐状态生成所述目标词向量对应的目标隐状态。
  11. 一种用于语言处理的循环神经网络中隐状态的生成装置,其中,所述装置包括:
    区域词向量生成模块,用于生成第一时刻输入的目标词向量的至少两个维 度的区域词向量;
    区域词向量组合模块,用于将各个所述区域词向量进行组合,得到至少两个维度的组合区域词向量;
    聚集变换处理模块,用于基于前馈神经网络将各个所述组合区域词向量进行聚集变换处理,得到所述目标词向量对应的聚集词向量;
    目标隐状态生成模块,用于基于所述聚集词向量生成所述目标词向量对应的目标隐状态。
  12. 根据权利要求11所述的装置,其中,所述区域词向量生成模块,还用于:
    获取至少两个第一权重矩阵,每个所述第一权重矩阵用于生成对应的区域词向量;
    确定第一时刻输入的目标词向量,并获取上一时刻的历史词向量对应的历史隐状态;
    基于所述第一权重矩阵和所述历史隐状态生成所述目标词向量的至少两个维度的区域词向量。
  13. 根据权利要求12所述的装置,其中,所述区域词向量生成模块,还用于:
    将所述目标词向量与所述历史隐状态进行拼接,得到拼接词向量;
    根据所述拼接词向量和所述第一权重矩阵生成区域词向量矩阵;所述区域词向量矩阵包括所述至少两个维度的区域词向量。
  14. 根据权利要求11至13任一项所述的装置,其中,所述区域词向量组合模块,还用于:
    确定各个所述区域词向量间的边权重;
    按照确定的各个所述边权重生成各个所述区域词向量共同对应的邻接矩阵;
    分别将所述邻接矩阵中各个维度的各个所述边权重进行加和,得到度矩阵;
    基于所述邻接矩阵和所述度矩阵生成所述至少两个维度的组合区域词向 量。
  15. 根据权利要求14所述的装置,其中,所述区域词向量组合模块,还用于:
    确定各个所述区域词向量共同对应的区域词向量矩阵;
    获取用于生成组合区域词向量矩阵的第二权重矩阵;
    根据所述邻接矩阵、所述度矩阵、所述区域词向量矩阵和所述第二权重矩阵生成所述组合区域词向量矩阵;所述组合区域词向量矩阵中包括所述至少两个维度的区域词向量。
  16. 根据权利要求11至13任一项所述的装置,其中,所述区域词向量组合模块,还用于:
    根据用于生成组合区域词向量的第三权重矩阵确定各个所述区域词向量对应的至少两个预测向量;
    确定各个所述区域词向量对应的至少两个先验概率对数;根据所述先验概率对数确定各个所述区域词向量对应的耦合系数;基于所述耦合系数和所述预测向量生成至少两个维度的候选组合区域词向量;
    再次从所述确定各个所述区域词向量对应的至少两个先验概率对数的步骤开始执行,对所述候选组合区域词向量进行迭代计算,直至符合预设迭代条件时停止迭代,将停止迭代时的至少两个维度的候选组合区域词向量确定为所述至少两个维度的组合区域词向量。
  17. 根据权利要求16所述的装置,其中,所述区域词向量组合模块,还用于:
    确定各个所述候选组合区域词向量与对应的各个所述预测向量间的标量积;
    将各个所述标量积与对应的所述先验概率对数进行加和,得到重新确定的各个所述区域词向量对应的先验概率对数。
  18. 根据权利要求11至13任一项所述的装置,其中,所述聚集变换处理 模块,还用于:
    基于前馈神经网络对各个所述组合区域词向量进行变换,得到变换后的组合区域词向量;
    将各个变换后的组合区域词向量进行拼接,得到拼接后的词向量;
    对拼接后的词向量进行线性变换,得到所述目标词向量对应的聚集词向量。
  19. 根据权利要求18所述的装置,其中,所述聚集变换处理模块,还用于:
    根据第四权重矩阵和第一偏置向量对各个所述组合区域词向量进行线性变换,得到各个组合区域词向量对应的临时词向量;
    分别选取各个所述临时词向量与向量阈值中的最大向量值;
    根据第五权重矩阵和第二偏置向量对各个所述最大向量值分别进行线性变换,得到所述变换后的组合区域词向量。
  20. 根据权利要求11至13任一项所述的装置,其中,所述聚集词向量包括第一聚集词向量和第二聚集词向量,所述第一聚集词向量与所述第二聚集词向量不同;
    所述目标隐状态生成模块,还用于:
    基于所述第一聚集词向量和对应的第一激活函数确定所述目标词向量对应的候选隐状态;
    基于所述第二聚集词向量和对应的第二激活函数确定所述目标词向量对应的门控参数;
    根据所述候选隐状态、所述门控参数和上一时刻的历史词向量的历史隐状态生成所述目标词向量对应的目标隐状态。
  21. 一种计算机可读存储介质,其中,计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至10中任一项所述方法的步骤。
  22. 一种计算机设备,其中,计算机设备中包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理 器执行如权利要求1至10中任一项所述方法的步骤。
PCT/CN2020/081177 2019-04-17 2020-03-25 用于语言处理的循环神经网络中隐状态的生成方法和装置 WO2020211611A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20790836.9A EP3958148A4 (en) 2019-04-17 2020-03-25 METHOD AND DEVICE FOR GENERATION OF HIDDEN STATE IN A NEURAL NETWORK FOR LANGUAGE PROCESSING
JP2021525643A JP7299317B2 (ja) 2019-04-17 2020-03-25 言語処理のためのリカレントニューラルネットワークにおける隠れ状態の生成方法及び装置
US17/332,318 US20210286953A1 (en) 2019-04-17 2021-05-27 Method and apparatus for generating hidden state in recurrent neural network for language processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910309929.5A CN110162783A (zh) 2019-04-17 2019-04-17 用于语言处理的循环神经网络中隐状态的生成方法和装置
CN201910309929.5 2019-04-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/332,318 Continuation US20210286953A1 (en) 2019-04-17 2021-05-27 Method and apparatus for generating hidden state in recurrent neural network for language processing

Publications (1)

Publication Number Publication Date
WO2020211611A1 true WO2020211611A1 (zh) 2020-10-22

Family

ID=67639625

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081177 WO2020211611A1 (zh) 2019-04-17 2020-03-25 用于语言处理的循环神经网络中隐状态的生成方法和装置

Country Status (5)

Country Link
US (1) US20210286953A1 (zh)
EP (1) EP3958148A4 (zh)
JP (1) JP7299317B2 (zh)
CN (1) CN110162783A (zh)
WO (1) WO2020211611A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162783A (zh) * 2019-04-17 2019-08-23 腾讯科技(深圳)有限公司 用于语言处理的循环神经网络中隐状态的生成方法和装置
CN111274818B (zh) * 2020-01-17 2023-07-14 腾讯科技(深圳)有限公司 词向量的生成方法、装置
CN112036546B (zh) * 2020-08-24 2023-11-17 上海交通大学 序列处理方法及相关设备
CN116363712B (zh) * 2023-03-21 2023-10-31 中国矿业大学 一种基于模态信息度评估策略的掌纹掌静脉识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014203042A1 (en) * 2013-06-21 2014-12-24 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Method for pseudo-recurrent processing of data using a feedforward neural network architecture
CN108595601A (zh) * 2018-04-20 2018-09-28 福州大学 一种融入Attention机制的长文本情感分析方法
CN109472031A (zh) * 2018-11-09 2019-03-15 电子科技大学 一种基于双记忆注意力的方面级别情感分类模型及方法
CN109492157A (zh) * 2018-10-24 2019-03-19 华侨大学 基于rnn、注意力机制的新闻推荐方法及主题表征方法
CN110162783A (zh) * 2019-04-17 2019-08-23 腾讯科技(深圳)有限公司 用于语言处理的循环神经网络中隐状态的生成方法和装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101778679B1 (ko) 2015-10-02 2017-09-14 네이버 주식회사 딥러닝을 이용하여 텍스트 단어 및 기호 시퀀스를 값으로 하는 복수 개의 인자들로 표현된 데이터를 자동으로 분류하는 방법 및 시스템
US20180129742A1 (en) * 2016-11-10 2018-05-10 Qualcomm Incorporated Natural language object tracking
US10255269B2 (en) * 2016-12-30 2019-04-09 Microsoft Technology Licensing, Llc Graph long short term memory for syntactic relationship discovery
EP3385862A1 (en) * 2017-04-03 2018-10-10 Siemens Aktiengesellschaft A method and apparatus for performing hierarchical entity classification
JP6712973B2 (ja) 2017-09-01 2020-06-24 日本電信電話株式会社 文生成装置、文生成学習装置、文生成方法、及びプログラム
US10515155B2 (en) * 2018-02-09 2019-12-24 Digital Genius Limited Conversational agent
US11170158B2 (en) * 2018-03-08 2021-11-09 Adobe Inc. Abstractive summarization of long documents using deep learning
US11010559B2 (en) * 2018-08-30 2021-05-18 International Business Machines Corporation Multi-aspect sentiment analysis by collaborative attention allocation
CN109800294B (zh) * 2019-01-08 2020-10-13 中国科学院自动化研究所 基于物理环境博弈的自主进化智能对话方法、系统、装置
US11880666B2 (en) * 2019-02-01 2024-01-23 Asapp, Inc. Generating conversation descriptions using neural networks
US11461638B2 (en) * 2019-03-07 2022-10-04 Adobe Inc. Figure captioning system and related methods
EP3893163A1 (en) * 2020-04-09 2021-10-13 Naver Corporation End-to-end graph convolution network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014203042A1 (en) * 2013-06-21 2014-12-24 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Method for pseudo-recurrent processing of data using a feedforward neural network architecture
CN108595601A (zh) * 2018-04-20 2018-09-28 福州大学 一种融入Attention机制的长文本情感分析方法
CN109492157A (zh) * 2018-10-24 2019-03-19 华侨大学 基于rnn、注意力机制的新闻推荐方法及主题表征方法
CN109472031A (zh) * 2018-11-09 2019-03-15 电子科技大学 一种基于双记忆注意力的方面级别情感分类模型及方法
CN110162783A (zh) * 2019-04-17 2019-08-23 腾讯科技(深圳)有限公司 用于语言处理的循环神经网络中隐状态的生成方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3958148A4 *

Also Published As

Publication number Publication date
EP3958148A1 (en) 2022-02-23
EP3958148A4 (en) 2022-06-15
CN110162783A (zh) 2019-08-23
JP2022507189A (ja) 2022-01-18
JP7299317B2 (ja) 2023-06-27
US20210286953A1 (en) 2021-09-16

Similar Documents

Publication Publication Date Title
US10248664B1 (en) Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval
WO2020211611A1 (zh) 用于语言处理的循环神经网络中隐状态的生成方法和装置
US20190303535A1 (en) Interpretable bio-medical link prediction using deep neural representation
CN112508085B (zh) 基于感知神经网络的社交网络链路预测方法
WO2021159714A1 (zh) 一种数据处理方法及相关设备
US20200134380A1 (en) Method for Updating Neural Network and Electronic Device
US20220083868A1 (en) Neural network training method and apparatus, and electronic device
US20230153615A1 (en) Neural network distillation method and apparatus
CN112015868B (zh) 基于知识图谱补全的问答方法
Zheng Gradient descent algorithms for quantile regression with smooth approximation
US11893060B2 (en) Latent question reformulation and information accumulation for multi-hop machine reading
Baek et al. Deep self-representative subspace clustering network
Wang et al. Neural machine-based forecasting of chaotic dynamics
EP4152212A1 (en) Data processing method and device
US20220406034A1 (en) Method for extracting information, electronic device and storage medium
WO2021129668A1 (zh) 训练神经网络的方法和装置
CN114424215A (zh) 多任务适配器神经网络
US9928214B2 (en) Sketching structured matrices in nonlinear regression problems
US20240046067A1 (en) Data processing method and related device
Zhang et al. Learning from few samples with memory network
Jin et al. Dual low-rank multimodal fusion
Pal et al. R-GRU: Regularized gated recurrent unit for handwritten mathematical expression recognition
CN114332469A (zh) 模型训练方法、装置、设备及存储介质
CN114445692A (zh) 图像识别模型构建方法、装置、计算机设备及存储介质
Ziyaden et al. Long-context transformers: A survey

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20790836

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021525643

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020790836

Country of ref document: EP

Effective date: 20211117