WO2019105157A1 - 摘要描述生成方法、摘要描述模型训练方法和计算机设备 - Google Patents

摘要描述生成方法、摘要描述模型训练方法和计算机设备 Download PDF

Info

Publication number
WO2019105157A1
WO2019105157A1 PCT/CN2018/111709 CN2018111709W WO2019105157A1 WO 2019105157 A1 WO2019105157 A1 WO 2019105157A1 CN 2018111709 W CN2018111709 W CN 2018111709W WO 2019105157 A1 WO2019105157 A1 WO 2019105157A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
hidden state
moment
decoding
loss function
Prior art date
Application number
PCT/CN2018/111709
Other languages
English (en)
French (fr)
Inventor
陈新鹏
马林
姜文浩
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18883654.8A priority Critical patent/EP3683725A4/en
Publication of WO2019105157A1 publication Critical patent/WO2019105157A1/zh
Priority to US16/685,702 priority patent/US11494658B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of machine learning technology, and in particular, to a summary description generation method, a summary description model training method, a computer device, and a storage medium.
  • a summary description refers to the use of a statement description information to obtain a summary of the information.
  • the information can be an image, video or text.
  • the summary description model refers to a neural network model for obtaining a summary of the information based on the input information.
  • a typical digest description model includes an encoding network and a decoding network.
  • the coding network is used to input data (image or text) for feature extraction to obtain a feature vector of the input data.
  • the decoding neural is used to input the feature vector to obtain the hidden state at each moment, and predict the word at the current moment according to the hidden state, thereby outputting the description sentence.
  • the decoding network In the traditional digest description model training process, the decoding network independently predicts the words at the current moment from the hidden state of the current moment. However, the context of the description statement is usually related. The traditional digest decoding network does not consider the correlation between two adjacent hidden states, which leads to a decrease in the accuracy of the digest description model prediction.
  • a digest description generation method a digest description model training method, a computer device, and a storage medium are provided.
  • a summary description generation method implemented by a computer device comprising:
  • a summary description model training method implemented by a computer device comprising:
  • the coding network obtains a feature vector of the training sample, and the decoding network uses a recurrent neural network, Decoding the feature vector to obtain a hidden state of each current time;
  • the reverse derivation obtains the hidden state of the previous time
  • a computer device comprising a memory and a processor, the memory storing a computer program, the computer program being executed by the processor, causing the processor to:
  • a computer device comprising a memory and a processor, the memory storing a computer program, the computer program being executed by the processor, causing the processor to perform the following steps:
  • the coding network obtains a feature vector of the training sample, and the decoding network uses a recurrent neural network, and the feature vector is used Decoding to obtain the hidden state of each current time;
  • the reverse derivation obtains the reverse state of the previous time
  • a computer readable storage medium storing a computer program, when executed by a processor, causes the processor to perform the following steps:
  • a computer readable storage medium storing a computer program, when executed by a processor, causes the processor to perform the following steps:
  • the coding network obtains a feature vector of the training sample, and the decoding network uses a recurrent neural network, and the feature vector is used Decoding to obtain the hidden state of each current time;
  • the reverse derivation obtains the reverse state of the previous time
  • FIG. 1 is a schematic flow chart of a summary description model training method in an embodiment
  • 2 is a flow chart showing the steps of deriving the hidden state of the previous time by using the weighting network in one embodiment
  • FIG. 3 is a schematic structural diagram of a reconstruction network in an embodiment
  • FIG. 5 is a schematic structural diagram of a summary description model in an embodiment
  • FIG. 6 is a schematic flow chart of a summary description model training method in another embodiment
  • FIG. 7 is a structural block diagram of a summary description model training device in an embodiment
  • Figure 8 is a block diagram showing the structure of a summary description device in an embodiment
  • Figure 9 is a block diagram showing the structure of a computer device in an embodiment.
  • a digest description model training method is provided that is implemented by a computer device.
  • the abstract description model training method specifically includes the following steps:
  • the training samples are related to the actual application of the product, and can be images, text or video.
  • the summary description model is the training object in this embodiment, and the purpose of the training is to obtain the relevant parameters of the summary description model.
  • the summary description model includes an encoding network and a decoding network. Wherein, the feature network is extracted by using the coding network, and the feature vector of the input data is obtained, and then the feature vector is decoded by using the decoding network.
  • the hidden state of the current time is obtained at each moment of decoding the network, whereby the hidden state generates a word, and after a number of times, a description sentence can be obtained.
  • the coding network obtains a feature vector of the training sample.
  • the coding network may employ a convolutional neural network or a recurrent neural network.
  • the feature vector includes a global feature vector and a local feature vector.
  • the global feature vector is a global feature representation of a training sample
  • the local feature vector is a local feature representation of a training sample.
  • the abstract describes model training, including two phases.
  • the coding network and decoding network of the abstract description model are used for training.
  • the coding network obtains the feature vector of the training sample, and the decoding network decodes the feature vector to obtain the hidden state of each current time, and decodes the word generated by the network according to the hidden state of each moment.
  • the training goal of the first stage is to make the corresponding words at each moment of the generation as close as possible to the actual labeled words, and use this as the first loss function, which is the training process of the maximum likelihood estimation.
  • the preliminary parameters of the digest description model can be obtained.
  • a general summary description model can be obtained that can be used to predict textual summaries of text, images, or videos.
  • the hidden state h t at time t is used to independently predict the word y' t+1 at the current moment in the training process.
  • the last t-1 hidden state h t-1 predicts the word independently.
  • y' t and in the actual prediction process, the words generated at each moment depend on the words generated at the previous moment. This difference in training and prediction also limits the performance of the model.
  • the present embodiment extracts the relationship between the adjacent hidden states of the decoding network in the training process, and further trains the digest description model.
  • step S104 the method further includes:
  • the reverse derivation obtains the reverse state of the previous time.
  • the reverse state is reversed, which means that the hidden state of the current time outputted by the decoding network is reversed, and the obtained reverse state of the previous time is reversed, that is, the previous time is reversed. It is speculated. Because the decoding process of the decoding network is based on the hidden state of the previous moment and the input of the current moment, the hidden state of the current moment is calculated, that is, there is an association between the hidden state of the previous moment and the hidden state of the current time, and the association is utilized. Can guess the hidden state of the last moment.
  • S108 Obtain a second loss function value according to the previous time to reversely push the hidden state and decode the actual hidden state of the previous time of the network output.
  • the root mean square error is used to measure the reverse state of the reverse push.
  • the difference from the actual hidden state h t-1 at the last moment of decoding the network output is taken as the second loss function.
  • the goal of the second loss function is to make the difference between the reverse state of the previous time and the actual hidden state of the previous time as small as possible.
  • the mean square error is the expected value of the square of the difference between the parameter estimate and the true value of the parameter, and is recorded as MSE.
  • MSE is a convenient method to measure the "average error”. MSE can evaluate the degree of change of the data. The smaller the value of MSE, the better the accuracy of the prediction model to describe the experimental data.
  • the relevance of two adjacent hidden states in the decoding network is fully considered. In the actual speculation process, the words generated at each moment depend on the words generated at the previous moment, which is also dependent on the relevance. Therefore, the second stage is added by the training process of the abstract description model. Training and mining the correlation between two adjacent hidden states in the decoding network can avoid the difference between training and prediction, and further improve the performance of the abstract description model.
  • the training process of the second stage of the digest description model is supervised based on the second loss function.
  • the training process of the second stage is to adjust the preliminary parameters of the summary description model determined in the first stage according to the correlation between two adjacent hidden states in the decoding network, and when the value of the second loss function reaches a preset value , take the corresponding parameters as a summary to describe the final parameters of the model.
  • the parameter when the second loss function value is the smallest is used as the final parameter of the digest description model.
  • the above summary describes the model training method.
  • the hidden state according to the output of the decoding network is added, and the reverse derivation obtains the hidden state of the previous time and performs the second phase.
  • the hidden state is reversed and the actual hidden state of the previous time of the decoding network output is obtained, and the second loss function value is obtained, and the hidden state of the previous time is the output of the decoding network.
  • the hidden state of the current state is derived in reverse, fully considering the relevance of two adjacent hidden states in the decoding network, and in the actual speculation process, it also depends on this correlation, and therefore, can improve the actual prediction. Accuracy.
  • the difference between training and prediction can be avoided, and the performance of the digest description model can be further improved.
  • the step of deriving the hidden state of the previous time is reversely derived, including: decoding the hidden state of each moment outputted by the network as a reconstructed network
  • the input of each corresponding time is reversely derived to obtain the reverse state of the previous time.
  • the reconstructed network is connected to the decoding network, and the output of the decoding network is used to inversely derive the hidden state of the previous moment corresponding to each moment based on the characteristics dependent on the following.
  • the reconstruction network in this embodiment uses a recurrent neural network.
  • FIG. 2 is a flow chart showing the steps of reversing the hidden state of the previous time by using the reconfiguration network in the embodiment. As shown in Figure 2, this step includes:
  • FIG. 3 is a schematic structural diagram of a reconstructed network in an embodiment, including an LSTM hidden layer and a fully connected layer FC.
  • the output of the hidden state at each moment of the decoding network is connected to the LSTM hidden layer of the corresponding time in the reconstructed network, and each LSTM hidden layer is connected to a fully connected layer.
  • i't is the input gate
  • f t ' is the forgetting gate
  • o' t is the output gate
  • g' t is the current moment output
  • c' t is the current moment input
  • is the sigmoid function
  • T is the transform mapping matrix :tanh() is the activation function
  • is the element-by-element dot multiplication operator.
  • h 't reconfigurable network is hidden state at time t, the reconstructed network time t Hide form h' t fully connected layer by layer, to obtain a moment the hidden thrust reverser
  • the reconstructed network is not limited to using the LSTM hidden layer to contact two adjacent hidden states in the decoding network.
  • a network structure such as a Gate Recurrent Unit (GRU), a Multilayer Perceptron (MLP), or a Convolutional Neural Network (CNN) may also be used to associate adjacent hidden states.
  • GRU Gate Recurrent Unit
  • MLP Multilayer Perceptron
  • CNN Convolutional Neural Network
  • the step of deriving the hidden state of the previous time is reversely derived, including: using a back propagation algorithm, according to each moment of the output of the decoding network The hidden state, the reverse derivation is the reverse state of the previous time.
  • the backpropagation algorithm solves the problem from the output of the neural network to the input and adjusts the parameters of the abstract description model. Specifically, the final parameter can be obtained by using a method such as gradient descent to solve the unconstrained problem.
  • the reflection conduction algorithm includes the following steps S1 to S4:
  • S1 Perform feedforward conduction calculation, and use the forward conduction formula to obtain L 2 , L 3 until the activation value of the output layer L nl .
  • the first phase of the training process includes the following steps:
  • FIG. 5 is a schematic structural diagram of a summary description model in one embodiment.
  • the description model includes an encoding network, a decoding network, and a reconstruction network.
  • the input connection of the output decoding network of the coding network, the output of the hidden layer at each moment of the decoding network is connected with the input of the time corresponding to the reconstruction network.
  • different coding networks may be selected for feature extraction.
  • a convolutional neural network can be used as the encoding network.
  • Convolutional neural networks have excellent performance on ImageNet datasets.
  • convolutional neural networks that can be used as coding networks such as the Inception-X series of convolutional neural networks, the ResNet series of convolutional neural networks, and so on.
  • the vector g output from the pooled layer of the convolutional neural network is represented as a global feature of the entire image, and the feature dimension here is 1536.
  • the recursive neural network encodes the text data.
  • a long-term and short-term memory (LSTM) neural network may be employed.
  • I t is the t-th word of the current sequence
  • T is the length of the text sequence.
  • the hidden h t h to give t-1 and the current input may be hidden on the time t-1 one time.
  • i t is the input gate
  • f t is the forgetting gate
  • o t is the output gate
  • h t is the hidden state
  • g t is the output of the current moment
  • x t is the input of the current moment
  • is the sigmoid function
  • T is the transform Mapping matrix
  • is an element-by-element dot multiplication operator.
  • the decoding network uses a recursive neural network to perform a decoding process on the feature vector outputted by the encoding network.
  • a recurrent neural network (RNN) can calculate the hidden state h t of the current time according to the hidden state h t-1 of the previous moment and the input of the current moment.
  • the recursive neural network with the attention mechanism is used for decoding.
  • the core unit of the recurrent neural network may be LSTM (long-term and short-term memory), and the decoding form of the decoding network is as follows:
  • i t is the input gate
  • f t is the forgetting gate
  • o t is the output gate
  • h t is the hidden state at time t
  • is the sigmoid function
  • T is the transformation mapping matrix
  • tanh() is the activation function
  • the element is multiplied by the operator
  • g t is the output of the current time
  • x t is the input of the current time.
  • z t is the context vector obtained by the attention mechanism, and has the following form:
  • S406 Generate a word corresponding to the current time according to the hidden state of each moment.
  • Decoding the network at each moment will get the hidden state h t of the current moment, and the hidden state will generate the word y' t+1 corresponding to the current moment:
  • W is the transformation matrix that maps the hidden vector to the vocabulary.
  • S408 Obtain a first loss function value according to the generated word at the current time and the actual word at the current time marked.
  • the abstract description model includes an encoding network and a decoding network.
  • the encoding network obtains the feature vector of the training sample, and the decoding network decodes the feature vector to obtain a hidden state of each current moment, and generates a word corresponding to the current moment according to the hidden state of each moment.
  • the first loss function value is obtained according to the word corresponding to the generated current moment and the current actual word of the annotation.
  • the root mean square error is used to measure the difference between the word corresponding to the current moment generated by the decoding network and the current actual word of the annotation, and this is taken as the first loss function.
  • the goal of the first loss function is to make the difference between the word corresponding to the current time and the current actual word of the annotation as small as possible.
  • the training process of the first stage of the digest description model is supervised based on the first loss function.
  • the first loss function value reaches the preset value
  • the corresponding parameter is taken as a preliminary parameter of the digest description model.
  • the parameter when the first loss function value is the smallest is used as a preliminary parameter of the digest description model.
  • the description model includes an encoding network, a decoding network, and a reconstruction network.
  • FIG. 6 is a flow chart of a summary description model training method of an embodiment. As shown in Figure 6, the following steps are included:
  • S608 Generate a word corresponding to the current time according to the hidden state of each moment.
  • S610 Obtain a first loss function value according to the generated word at the current time and the actual word at the current time marked.
  • the reverse derivation obtains the reverse state of the previous time.
  • the step includes: using the hidden state of each time output of the decoding network as an input of each corresponding time of the reconstructed network, deriving the reverse state in the reverse direction, or using a back propagation algorithm according to the decoding.
  • the hidden state of each moment of the network output, the reverse derivation gets the reverse state of the previous moment
  • S616 Obtain a second loss function value according to the previous time to reversely push the hidden state and decode the actual hidden state of the previous time of the network output.
  • the method fully considers the correlation between two adjacent hidden states in the decoding network, and in the actual estimation process, it also depends on the correlation, and therefore, the accuracy of the actual prediction can be improved. Moreover, the difference between training and prediction can be avoided, and the performance of the digest description model can be further improved.
  • a summary description generation method is provided, which is executed on the server side and implemented by a computer device on the server side, and includes the following steps:
  • the input information refers to information that the user inputs through the terminal and sends to the server.
  • the server in this embodiment can provide services such as retrieval, classification, or recommendation.
  • the input information can be an image or text.
  • the summary description model is trained by using the abstract description model training method of each of the above embodiments.
  • the training method for describing the model in detail is described in the above embodiments, and details are not described herein again.
  • a summary description model for one embodiment is shown in FIG.
  • the summary is used to describe the generation method, which can be used for text data, image data or video for prediction, and generate description statements.
  • the description of image generation can be used for scene classification of images, such as automatic summarization of images in user albums; it also contributes to image retrieval services; and helps visually impaired people understand images.
  • this technique can be used to describe the meaning of the text of the paragraph, and can further serve the classification and mining of text.
  • Figure 7 is a block diagram showing the structure of the abstract description model training device in one embodiment.
  • a summary description model training apparatus includes an input module 702, a first stage training module 704, a back push module 706, a loss value calculation module 708, and a parameter determination module 710.
  • the input module 702 is configured to input the labeled training sample into the summary description model.
  • the first stage training module 704 is configured to perform training on the coding network and the decoding network of the digest description model based on the supervision of the first loss function; the coding network obtains the feature vector of the training sample, and the decoding network uses a recurrent neural network to perform the feature vector Decode to get the hidden state of each current moment.
  • the backstepping module 706 is configured to reversely derive the hidden state of the previous time according to the hidden state of each moment outputted by the decoding network.
  • the loss value calculation module 708 is configured to obtain a second loss function value according to the previous time reversing the hidden state and decoding the actual hidden state of the previous time of the network output.
  • the parameter determining module 710 is configured to obtain, when the second loss function value reaches a preset value, a final parameter of the summary description model determined by the supervisor based on the second loss function.
  • the above summary describes the model training device.
  • the hidden state of each moment according to the output of the decoding network is added, and the reverse derivation obtains the hidden state of the previous time and performs the second phase.
  • the hidden state is reversed and the actual hidden state of the previous time of decoding the network output is obtained, and the second loss function value is obtained, and the hidden state of the previous time is the output of the decoding network.
  • the hidden state of the current state is derived in reverse, fully considering the relevance of two adjacent hidden states in the decoding network, and in the actual speculation process, it also depends on this correlation, and therefore, can improve the actual prediction. Accuracy.
  • the difference between training and prediction can be avoided, and the performance of the digest description model can be further improved.
  • the inverse push module is configured to use the hidden state of each time output of the decoding network as an input of each corresponding time of the reconstructed network, and derivate the reverse state to obtain a reverse state of the previous time.
  • the back push module includes: a reconstruction module and a connection module.
  • the reconstruction module is configured to input the hidden state of each current moment outputted by the decoding network and the hidden state of the previous moment in the reconstructed network to the LSTM hidden layer of the current time of the reconstructed network, to obtain the current moment in the reconstructed network.
  • the hidden state is configured to input the hidden state of each current moment outputted by the decoding network and the hidden state of the previous moment in the reconstructed network to the LSTM hidden layer of the current time of the reconstructed network, to obtain the current moment in the reconstructed network.
  • the hidden state is configured to input the hidden state of each current moment outputted by the decoding network and the hidden state of the previous moment in the reconstructed network to the LSTM hidden layer of the current time of the reconstructed network, to obtain the current moment in the reconstructed network.
  • the connection module is configured to input the hidden state of the current time in the reconstructed network into the fully connected layer, and obtain the hidden state of the previous time.
  • the back-pushing module is configured to use the back-propagation algorithm to derive the hidden state of the previous time according to the hidden state of each moment outputted by the decoding network.
  • the first stage training module comprises: an encoding module, a decoding module, a word generating module, a calculating module and a confirming module.
  • the coding module is configured to input the labeled training samples into the coding network, extract features of the training information, and obtain feature vectors of the training information.
  • the decoding module is configured to input the feature vector into the decoding network to obtain a hidden state at each moment.
  • the word generating module is configured to generate a word corresponding to the current moment according to the hidden state of each moment.
  • the calculation module is configured to obtain a first loss function value according to the generated word corresponding to the current moment and the actual word of the current moment marked.
  • the confirmation module is configured to determine whether the first loss function value reaches a preset value, and end the training of the first stage when the set value is reached.
  • a summary description generating apparatus including an information acquisition module 802 and a prediction module 804.
  • the information obtaining module 802 is configured to obtain input information.
  • the prediction module 804 is configured to input the input information into the pre-trained digest description model, obtain the feature vector of the input information by describing the coding network of the model, and decode the feature vector by the decoding network of the digest description model to generate the input information.
  • Abstract description wherein, based on the supervision of the first loss function, the coding network and the decoding network are trained, and according to the hidden state of each moment outputted by the coding network, the previous state is reversely derived and the hidden state is reversed, according to the previous one. The time is reversed to hide the hidden state and the actual hidden state of the previous time of decoding the network output, and the summary description model based on the second loss function is obtained.
  • the summary description generating apparatus further includes the structures of the modules of the abstract description model training apparatus in the above embodiments, and details are not described herein again.
  • the summary description generating device can be used for text data, image data or video for prediction, and generates a description sentence.
  • the description of image generation can be used for scene classification of images, such as automatic summarization of images in user albums; it also contributes to image retrieval services; and helps visually impaired people understand images.
  • this technique can be used to describe the meaning of the text of the paragraph, and can further serve the classification and mining of text.
  • Figure 9 is a diagram showing the internal structure of a computer device in one embodiment.
  • the computer device includes the computer device including a processor, a memory, a network interface, an input device, and a display screen connected by a system bus.
  • the memory comprises a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program, which when executed by the processor, may cause the processor to implement a digest description model training method or a digest description generation method.
  • the internal memory may also store a computer program that, when executed by the processor, may cause the processor to execute a digest description model training method or a digest description generation method.
  • the display screen of the computer device may be a liquid crystal display or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the computer device casing, and It can be an external keyboard, trackpad or mouse.
  • FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • the summary description model training apparatus can be implemented in the form of a computer program that can be run on a computer device as shown in FIG.
  • the program modules constituting the summary description model training device may be stored in a memory of the computer device, such as the input module, the first-stage training module, and the reverse-push module shown in FIG. 7, and the information acquisition module and the prediction module shown in FIG. .
  • the computer program of the various program modules causes the processor to perform the steps in the training of the various embodiments of the present application described in this specification.
  • the computer device shown in FIG. 9 can perform the step of inputting the labeled training samples into the digest description model by an input module in the model training device as described in the summary of FIG.
  • the computer device may perform the first stage of training on the encoding network and the decoding network of the digest description model by the first stage training module performing supervision based on the first loss function.
  • the computer device can perform the hidden state of each moment output according to the decoding network through the back-push module, and the reverse derivation obtains the hidden state of the previous time.
  • the computer device of FIG. 9 may perform the step of acquiring the input information through the input module in the summary description generating device as shown in FIG. 8, and the input of the input information into the pre-trained summary description model by the prediction module, through the summary description
  • the coding network of the model obtains the feature vector of the input information, decodes the feature vector by the decoding network of the summary description model, and generates a summary description of the input information.
  • a computer apparatus comprising a memory and a processor, the memory storing a computer program, the computer program being executed by the processor, causing the processor to perform the following steps:
  • the input information is input into the pre-trained summary description model, and the feature vector of the input information is obtained by describing the coding network of the model, and the feature vector is decoded by the decoding network of the summary description model to generate a summary description of the input information;
  • the coding network and the decoding network are trained, and according to the hidden state of each moment outputted by the coding network, the hidden state of the previous time is reversely derived, and the hidden state is reversed according to the previous moment. Decoding the actual hidden state of the network output at the last moment, and obtaining a summary description model based on the second loss function.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the coding network and the decoding network of the digest description model are trained; the coding network obtains the feature vector of the training sample, and the decoding network uses the recurrent neural network to decode the feature vector to obtain the hidden state of each current moment. ;
  • the reverse derivation obtains the hidden state of the previous time
  • the processor when the computer program is executed by the processor, the processor is caused to perform the steps of: decoding the hidden state of each moment of the output of the network as an input of each corresponding moment of the reconstructed network, and deriving the last moment in reverse. Reverse the hidden state.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the hidden state of each current moment outputted by the decoding network and the hidden state of the last time in the reconstructed network are input to the LSTM hidden layer of the current time of the reconstructed network, and the hidden state of the current moment in the reconstructed network is obtained;
  • the hidden state of the current time in the reconstructed network is input into the fully connected layer, and the hidden state of the previous time is obtained.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following steps: using a backpropagation algorithm, according to the hidden state of each moment of the output of the decoding network, the reverse derivation is obtained by deriving the hidden state of the previous moment. .
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the labeled training samples are input into the coding network, and the characteristics of the training information are extracted, and the feature vectors of the training information are obtained;
  • a computer apparatus comprising a memory and a processor, the memory storing a computer program, the computer program being executed by the processor, causing the processor to perform the following steps:
  • the coding network and the decoding network of the digest description model are trained; the coding network obtains the feature vector of the training sample, and the decoding network uses the recurrent neural network to decode the feature vector to obtain the hidden state of each current moment. ;
  • the reverse derivation obtains the hidden state of the previous time
  • the processor when the computer program is executed by the processor, the processor is caused to perform the steps of: decoding the hidden state of each moment of the output of the network as an input of each corresponding moment of the reconstructed network, and deriving the last moment in reverse. Reverse the hidden state.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the hidden state of each current moment outputted by the decoding network and the hidden state of the last time in the reconstructed network are input to the LSTM hidden layer of the current time of the reconstructed network, and the hidden state of the current moment in the reconstructed network is obtained;
  • the hidden state of the current time in the reconstructed network is input into the fully connected layer, and the hidden state of the previous time is obtained.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following steps: using a backpropagation algorithm, according to the hidden state of each moment of the output of the decoding network, the reverse derivation is obtained by deriving the hidden state of the previous moment. .
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the labeled training samples are input into the coding network, and the characteristics of the training information are extracted, and the feature vectors of the training information are obtained;
  • a computer readable storage medium stored with a computer program that, when executed by a processor, causes the processor to perform the following steps:
  • the input information is input into the pre-trained summary description model, and the feature vector of the input information is obtained by describing the coding network of the model, and the feature vector is decoded by the decoding network of the summary description model to generate a summary description of the input information;
  • the coding network and the decoding network are trained, and according to the hidden state of each moment outputted by the coding network, the hidden state of the previous time is reversely derived, and the hidden state is reversed according to the previous moment. Decoding the actual hidden state of the network output at the last moment, and obtaining a summary description model based on the second loss function.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the coding network and the decoding network of the digest description model are trained; the coding network obtains the feature vector of the training sample, and the decoding network uses the recurrent neural network to decode the feature vector to obtain the hidden state of each current moment. ;
  • the reverse derivation obtains the hidden state of the previous time
  • the processor when the computer program is executed by the processor, the processor is caused to perform the steps of: decoding the hidden state of each moment of the output of the network as an input of each corresponding moment of the reconstructed network, and deriving the last moment in reverse. Reverse the hidden state.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the hidden state of each current moment outputted by the decoding network and the hidden state of the last time in the reconstructed network are input to the LSTM hidden layer of the current time of the reconstructed network, and the hidden state of the current moment in the reconstructed network is obtained;
  • the hidden state of the current time in the reconstructed network is input into the fully connected layer, and the hidden state of the previous time is obtained.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following steps: using a backpropagation algorithm, according to the hidden state of each moment of the output of the decoding network, the reverse derivation is obtained by deriving the hidden state of the previous moment. .
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the labeled training samples are input into the coding network, and the characteristics of the training information are extracted, and the feature vectors of the training information are obtained;
  • a computer readable storage medium stored with a computer program that, when executed by a processor, causes the processor to perform the following steps:
  • the coding network and the decoding network of the digest description model are trained; the coding network obtains the feature vector of the training sample, and the decoding network uses the recurrent neural network to decode the feature vector to obtain the hidden state of each current moment. ;
  • the reverse derivation obtains the hidden state of the previous time
  • the processor when the computer program is executed by the processor, the processor is caused to perform the steps of: decoding the hidden state of each moment of the output of the network as an input of each corresponding moment of the reconstructed network, and deriving the last moment in reverse. Reverse the hidden state.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the hidden state of each current moment outputted by the decoding network and the hidden state of the previous time in the reconstructed network are input to the LSTM hidden layer of the current time of the reconstructed network, and the hidden state of the current time in the reconstructed network is obtained;
  • the hidden state of the current time in the reconstructed network is input into the fully connected layer, and the hidden state of the previous time is obtained.
  • the processor when the computer program is executed by the processor, the processor is caused to perform the following steps: using a backpropagation algorithm, according to the hidden state of each moment of the output of the decoding network, the reverse derivation is obtained by deriving the hidden state of the previous moment. .
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the labeled training samples are input into the coding network, and the characteristics of the training information are extracted, and the feature vectors of the training information are obtained;
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请涉及一种摘要描述生成方法、摘要描述模型训练方法、计算机设备和存储介质,该摘要描述模型训练方法包括:将带标注的训练样本输入摘要描述模型;基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行第一阶段的训练;根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。

Description

摘要描述生成方法、摘要描述模型训练方法和计算机设备
本申请要求于2017年11月30日提交中国专利局,申请号为201711243949.4,发明名称为“摘要描述生成方法和装置、摘要描述模型训练方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习技术领域,特别是涉及一种摘要描述生成方法、摘要描述模型训练方法、计算机设备和存储介质。
背景技术
摘要描述是指使用语句描述信息,得到该信息的摘要。信息可以为图像、视频或文本。摘要描述模型是指用于根据输入的信息,得到该信息的摘要的神经网络模型。
通常的摘要描述模型包括编码网络和解码网络。编码网络用于输入数据(图像或文本)进行特征提取,得到输入数据的特征向量。解码神经用于输入特征向量得到各时刻的隐藏状态,根据隐藏状态预测当前时刻的单词,从而输出描述语句。
传统的摘要描述模型训练过程中,解码网络由当前时刻的隐藏状态来独立地预测当前时刻的单词。而通常描述语句的上下文具有关联性,传统的摘要解码网络没有考虑相邻两个隐藏状态的关联性,导致摘要描述模型预测的准确度降低。
发明内容
根据本申请的各种实施例,提供一种摘要描述生成方法、摘要描述模型 训练方法、计算机设备和存储介质。
一种摘要描述生成方法,所述方法由计算机设备实施,包括:
获取输入信息;
将所述输入信息输入预先训练好的摘要描述模型,通过所述摘要描述模型的编码网络,得到所述输入信息的特征向量,通过所述摘要描述模型的解码网络对所述特征向量进行解码,生成所述输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据所述编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的所述摘要描述模型。
一种摘要描述模型训练方法,所述方法由计算机设备实施,包括:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行第一阶段的训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器以下步骤:
获取输入信息;
将所述输入信息输入预先训练好的摘要描述模型,通过所述摘要描述模型的编码网络,得到所述输入信息的特征向量,通过所述摘要描述模型的解 码网络对所述特征向量进行解码,生成所述输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据所述编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的所述摘要描述模型。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
获取输入信息;
将所述输入信息输入预先训练好的摘要描述模型,通过所述摘要描述模型的编码网络,得到所述输入信息的特征向量,通过所述摘要描述模型的解码网络对所述特征向量进行解码,生成所述输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据所述编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的所述摘要描述模型。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为一个实施例中摘要描述模型训练方法的流程示意图;
图2为一个实施例中一个实施例中利用构重网络,反向推导得到上一时刻反推隐藏状态的步骤的流程图;
图3为一个实施例中重构网络的结构示意图;
图4为一个实施例中摘要描述模型的第一阶段训练的步骤流程图;
图5为一个实施例中摘要描述模型的结构示意图;
图6为另一个实施例中摘要描述模型训练方法的流程示意图;
图7为一个实施例中摘要描述模型训练装置的结构框图;
图8为一个实施例中摘要描述装置的结构框图;
图9为一个实施例中计算机设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,在一个实施例中,提供了一种摘要描述模型训练方法,该方法由计算机设备实施。参照图1,该摘要描述模型训练方法具体包括如下步骤:
S102,将带标注的训练样本输入摘要描述模型。
其中,训练样本与产品的实际应用相关,可以为图像、文本或视频。带标注的训练样本应当包括每一训练样本以及对每一训练样本的描述语句。例如,对于给定的一张图像或者文本段I,该训练样本对应的描述语句为y={y 1,y 2,...,y i},其中,y i是构成描述语句的一个单词。
摘要描述模型是本实施例中的训练对象,训练的目的是获得摘要描述模型的相关参数。摘要描述模型包括编码网络和解码网络。其中,利用编码网络提取特征,得到输入数据的特征向量,再使用解码网络对特征向量进行解码。在解码网络的每一刻时刻得到当前时刻的隐藏状态,由此隐藏状态生成一个单词,经过若干个时刻,便可得到一句描述语句。
S104,基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行训练;编码网络得到训练样本的特征向量,解码网络采用递归神经网络,对特征向量进行解码,得到各当前时刻的隐藏状态。
其中,编码网络得到训练样本的特征向量。编码网络可采用卷积神经网络或递归神经网络。其中,特征向量包括全局特征向量和局部特征向量。全局特征向量是一个训练样本的全局特征表示,局部特征向量是一个训练样本的局部特征表示。
本实施例中,摘要描述模型训练,包括两个阶段。其中,第一阶段,根 据带标注的训练样本,利用摘要描述模型的编码网络和解码网络进行训练。其中,编码网络得到训练样本的特征向量,解码网络对特征向量进行解码,得到各当前时刻的隐藏状态,解码网络根据每一时刻的隐藏状态生成的单词。第一阶段的训练目标就是使生成的各时刻对应的单词与实际标注单词尽可能的接近,并将此作为第一损失函数,这就是最大似然估计的训练过程。
在基于第一损失函数,对摘要描述模型进行第一阶段的训练完成后,能够得到摘要描述模型的初步参数。通常,根据初步参数,即可得到常规的摘要描述模型,能够用于预测文本、图像或视频的文本摘要。但会存在一个问题:训练过程中用t时刻的隐藏状态h t来独立地预测当前时刻的单词y′ t+1,同理,上一个t-1隐藏状态h t-1独立地预测到单词y′ t,而在实际预测过程中,每一时刻生成的单词都依赖其上一个时刻所生成的单词。这种训练与预测的差异性也限制了模型的性能。为此,本实施例提取了在训练过程中,考虑解码网络相邻隐藏状态之间的联系,进一步训练摘要描述模型。
具体地,在步骤S104之后,还包括:
S106,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
其中,上一时刻反推隐藏状态,是指将解码网络输出的当前时刻的隐藏状态,进行反推,得到的当前时刻的上一时刻的反推隐藏状态,即,上一时刻反推隐藏状态是推测得到的。由于解码网络的解码过程,是根据上一时刻的隐藏状态和当前时刻的输入,计算当前时刻的隐藏状态,即上一时刻的隐藏状态与当前时候的隐藏状态之间存在关联,利用这种关联能够推测上一时刻的隐藏状态。
S108,根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值。
本实施例中,使用均方根误差来衡量反推出的上一时刻反推隐藏状态
Figure PCTCN2018111709-appb-000001
与解码网络输出的上一时刻实际隐藏状态h t-1的差异,并将此作为第二损失函数。第二损失函数的目标是使上一时刻反推隐藏状态和上一时刻实际 隐藏状态的差异尽可能地小。均方误差是指参数估计值与参数真值之差平方的期望值,记为MSE。MSE是衡量“平均误差”的一种较方便的方法,MSE可以评价数据的变化程度,MSE的值越小,说明预测模型描述实验数据具有更好的精确度。
第二损失函数值越小,上一时刻反推隐藏状态和上一时刻实际隐藏状态也越小,而上一时刻反推隐藏状态是利用解码网络输出的当前时刻的隐藏状态反向推导得到,充分考虑了解码网络中相邻两个隐藏状态的关联性。而在实际的推测过程中,每一时刻生成的单词都依赖其上一个时刻所生成的单词,即也依赖于这种关联性,因此,通过在摘要描述模型的训练过程中增加第二阶段的训练,挖掘解码网络中相邻两个隐藏状态的关联性,能够避免训练与预测的差异性,进一步提高摘要描述模型的性能。
S110,当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
本实施例中,基于第二损失函数对摘要描述模型的第二阶段的训练过程进行监督。第二阶段的训练过程,则是根据解码网络中相邻两个隐藏状态的关联性,调整第一阶段确定的摘要描述模型的初步参数的过程,并在第二损失函数值达到预设值时,取对应的参数作为摘要描述模型的最终参数。又或者,在第二阶段训练过程中,迭代次数大于预设的最大迭代次数时,将第二损失函数值最小时的参数作为摘要描述模型的最终参数。
上述的摘要描述模型训练方法,在传统的编码网络和解码网络的基础上,增加了根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态并进行第二阶段训练的过程,第二阶段训练中,根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值,而上一时刻反推隐藏状态是利用解码网络输出的当前时刻的隐藏状态反向推导得到,充分考虑了解码网络中相邻两个隐藏状态的关联性,而在实际的推测过程中,也依赖于这种关联性,因此,能够提高实际预测的准确度。并且,能够避免训练与预测的差异性,进一步提高摘要描述模型的性能。
在一个实施例中,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态的步骤,包括:将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。重构网络是与解码网络连接,利用解码网络的输出基于对下文依赖的特点,用于反向推导每一时刻对应的上一时刻的隐藏状态。本实施例中的重构网络采用递归神经网络。
图2为一个实施例中利用重构网络,反向推导得到上一时刻反推隐藏状态的步骤的流程图。如图2所示,该步骤包括:
S202,将解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络各当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态。
图3为一个实施例中重构网络的结构示意图,包括LSTM隐含层和全连接层FC。解码网络的每个时刻隐藏状态的输出都与重构网络中的对应时刻的LSTM隐含层连接,每个LSTM隐含层连接一个全连接层。
具体地,重构的具体形式如下:
用解码网络输出的当前时刻的隐藏状态h t和重构网络中上一时刻的隐藏状态h′ t-1,通过一个LSTM隐含层去重构上一个时刻的隐藏状态h t-1,具体形式如下:
Figure PCTCN2018111709-appb-000002
c′ t=f′ t⊙c′ t-1+i′ t⊙g′ t
h′ t=o′ t⊙tanh(c′ t)
其中,i′ t为输入门,f t′为遗忘门,o′ t为输出门,g′ t当前时刻的输出,c′ t为当前时刻的输入,σ为sigmoid函数;T为变换映射矩阵:tanh()为激活函 数;⊙为逐元素点乘运算符。
S204,将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
上式中,h′ t是重构网络中t时刻的隐藏状态,将重构网络中t时刻的隐藏状h′ t通过一层全连接层,得到上一时刻反推隐藏状态
Figure PCTCN2018111709-appb-000003
本实施例中,重构网络不局限于使用LSTM隐含层来联系解码网络中两相邻的隐藏状态。还可使用递归门单元(Gate Recurrent Unit,缩写GRU)、多层感知机(Multilayer Perceptron,缩写MLP)、卷积神经网络(Convolutional Neural Network,缩写CNN)等网络结构联系相邻隐藏状态。
在另一个实施例中,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态的步骤,包括:利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
反向传播算法,顾名思义就是从神经网络的输出到输入进行求解,调节摘要描述模型的参数。具体地,可使用梯度下降等求解无约束问题的方法求得最终的参数。反射传导算法包括以下步骤S1至S4:
S1:进行前馈传导计算,利用前向传导公式,得到L 2,L 3,直至输出层L nl的激活值。
S2:对于第nl层(输出层)的每个输出单元i,计算残差。
S3:对于l=nl-1,nl-2,nl-2,…,2的各层,计算第l层的第i个节点的残差。
S4:根据残差计算最终的偏导数值。
图4为一个实施例中摘要描述模型的第一阶段训练的步骤流程图。如图4所示,第一阶段的训练过程包括以下步骤:
S402,将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量。
图5为一个实施例中摘要描述模型的结构示意图。如图5所示,描述模型包括编码网络、解码网络和重构网络。其中,编码网络的输出解码网络的输入连接,解码网络的各时刻的隐含层的输出与重构网络对应时刻的输入连 接。
具体地,通过编码网络提取到训练样本的全局特征表示g,以及64个局部特征表示s={s 1,...,s 64}。本实施例中,针对训练样本的特点,可选用不同的编码网络进行特征提取。
具体地,当要描述的对象为图像时,可使用卷积神经网络作为编码网络。卷积神经网络在ImageNet数据集上有着很好的性能表现。目前可作为编码网络的卷积神经网络有很多,如Inception-X系列的卷积神经网络,ResNet系列的卷积神经网络等等。
具体地,取卷积神经网络的池化层输出的向量g作为整张图像的全局特征表示,g这里的特征维数是1536。再取卷积神经网络的最后一个Inception-C模块的输出s作为图像的局部特征表示,这里s={s 1,...,s 64},其中每一个局部特征向量s i的维度亦是1536维。所以,将一张图像输入进卷积编码网络,可以得到一个图像的全局特征向量
Figure PCTCN2018111709-appb-000004
一系列图像不同区域的局部特征向量
Figure PCTCN2018111709-appb-000005
当要生成描述的对象是文本数据时,由于文本数据具有较强的时序特征。因此,可递归神经网络对文本数据进行编码。本实施例中,对于文本数据,可采用长短期记忆(LSTM)神经网络。
将待输入的文本序列记为I={I 1,...,I T},这里的I t为当前序列的第t个单词,T为文本序列的长度。在LSTM中,隐藏状态h t可由上一t-1时刻的隐藏状态h t-1以及当前时刻的输入得到。有如下形式:
h t=LSTM(h t-1,I t)
在编码网络中,LSTM的具体表达形式如下:
Figure PCTCN2018111709-appb-000006
c t=f t⊙c t-1+i t⊙g t
h t=o t⊙tanh(c t)
其中,i t为输入门,f t为遗忘门,o t为输出门,h t为隐藏状态,g t为当前时刻的输出,x t为当前时刻的输入;σ为sigmoid函数;T为变换映射矩阵;⊙为逐元素点乘运算符。
本实施例中,取T时刻的隐藏状态h T作为一段训练文本的整体表示特征向量g,即g=h T。LSTM中每一个时刻产生的隐藏状态h t作为一段训练文本的局部表示特征,即s={s 1,...,s T}={h 1,...,h T}。
S404,将特征向量输入解码网络,得到每个时刻的隐藏状态。
具体地,解码网络,采用递归神经网络对编码网络输出的特征向量,进行解码处理的结构。递归神经网络(recurrent neural network,RNN),根据上一时刻的隐藏状态h t-1与当前时刻的输入,可以计算当前时刻的隐藏状态h t。具体地,利用带有注意力机制的递归神经网络进行解码,递归神经网络的核心单元可以为LSTM(长短期记忆),解码网络的解码形式如下:
Figure PCTCN2018111709-appb-000007
c t=f t⊙c t-1+i t⊙g t
h t=o t⊙tanh(c t)
其中,i t为输入门,f t为遗忘门,o t为输出门,h t为t时刻的隐藏状态;σ为sigmoid函数;T为变换映射矩阵;tanh()为激活函数;⊙为逐元素点乘运算符,g t为当前时刻的输出,x t为当前时刻的输入。z t是注意力机制得到的上下文向量,有如下形式:
Figure PCTCN2018111709-appb-000008
上式中的α(s i,h t-1)表示了输入数据的局部特征s={s 1,...,s T}与前一个 隐藏状态的相关性。
S406,根据每个时刻的隐藏状态生成当前时刻所对应的单词。
在每一个时刻解码网络会得到当前时刻的隐藏状态h t,由这个隐藏状态生成当前时刻所对应的单词y′ t+1
y′ t+1=argmax Softmax(Wh t)
其中,W是将隐向量映射到词汇表的变换矩阵。
S408,根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值。
摘要描述模型包括编码网络和解码网络,编码网络得到训练样本的特征向量,解码网络对特征向量进行解码,得到各当前时刻的隐藏状态,根据每个时刻的隐藏状态生成当前时刻所对应的单词,根据生成的当前时刻所对应的单词与标注的当前实际单词得到第一损失函数值。
本实施例中,使用均方根误差来衡量解码网络生成的当前时刻所对应的单词和标注的当前实际单词的差异,并将此作为第一损失函数。第一损失函数的目标是使当前时刻所对应的单词和标注的当前实际单词的差异尽可能地小。
S410,当第一损失函数值达到预设值时,得到摘要描述模型的初步参数。
本实施例中,基于第一损失函数对摘要描述模型的第一阶段的训练过程进行监督。在第一损失函数值达到预设值时,取对应的参数作为摘要描述模型的初步参数。又或者,在第一阶段训练过程中,迭代次数大于预设的最大迭代次数时,将第一损失函数值最小时的参数作为摘要描述模型的初步参数。
一个实施例的摘要描述模型如图5所示,描述模型包括编码网络、解码网络和重构网络。其中,编码网络的输出解码网络的输入连接,解码网络的各时刻的隐含层的输出与重构网络对应时刻的输入连接。
图6为一个实施例的摘要描述模型训练方法的流程图。如图6所示,包括以下步骤:
S602,将带标注的训练样本输入摘要描述模型。
S604,将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量。
S606,将特征向量输入解码网络,得到每个时刻的隐藏状态。
S608,根据每个时刻的隐藏状态生成当前时刻所对应的单词。
S610,根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值。
S612,当第一损失函数值达到预设值时,得到摘要描述模型的初步参数。
S614,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
具体地,该步骤包括:将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态,或利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态
S616,根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值。
S618,当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
该方法充分考虑了解码网络中相邻两个隐藏状态的关联性,而在实际的推测过程中,也依赖于这种关联性,因此,能够提高实际预测的准确度。并且,能够避免训练与预测的差异性,进一步提高摘要描述模型的性能。
在一个实施例中,提供一种摘要描述生成方法,该方法运行在服务器侧,由服务器侧的计算机设备实施,包括以下步骤:
S1,获取输入信息。
其中,输入信息是指用户通过终端输入并发送至服务器的信息。本实施例中的服务器可提供检索、分类或推荐等服务。输入信息可以为图片或文字。
S2,将输入信息输入预先训练好的摘要描述模型,通过摘要描述模型的编码网络,得到输入信息的特征向量,通过摘要描述模型的解码网络对特征 向量进行解码,生成输入信息的摘要描述。
具体地,利用上述各实施例的摘要描述模型训练方法,训练得到摘要描述模型。具体地摘要描述模型的训练方法,已记载在上述各实施例中,此处不再赘述。一个实施例的摘要描述模型如图5所示。利用该摘要描述生成方法,可用于文本数据、图像数据或视频进行预测,生成描述语句。对于图像生成的描述,可以用于图像的场景分类,如对用户相册中的图像自动总结归类;也有助于图像检索服务;以及帮助视觉障碍者理解图像。对于文本笔记数据,该技术可以用于描述该段文本的含义,可以进一步服务于文本的分类与挖掘。
图7为一个实施列中的摘要描述模型训练装置的结构示意图。如图7所示,一种摘要描述模型训练装置,包括:输入模块702、第一阶段训练模块704、反推模块706、损失值计算模块708和参数确定模块710。
输入模块702,用于将带标注的训练样本输入摘要描述模型。
第一阶段训练模块704,用于基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行训练;编码网络得到训练样本的特征向量,解码网络采用递归神经网络,对特征向量进行解码,得到各当前时刻的隐藏状态。
反推模块706,用于根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
损失值计算模块708,用于根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值。
参数确定模块710,用于当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
上述的摘要描述模型训练装置,在传统的编码网络和解码网络的基础上,增加了根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态并进行第二阶段训练的过程,第二阶段训练中,根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值, 而上一时刻反推隐藏状态是利用解码网络输出的当前时刻的隐藏状态反向推导得到,充分考虑了解码网络中相邻两个隐藏状态的关联性,而在实际的推测过程中,也依赖于这种关联性,因此,能够提高实际预测的准确度。并且,能够避免训练与预测的差异性,进一步提高摘要描述模型的性能。
在另一个实施例中,反推模块,用于将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
在再一个实施例中,反推模块包括:重构模块和连接模块。
重构模块,用于将解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态。
连接模块,用于将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
在又一个实施例中,反推模块,用于利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,第一阶段训练模块,包括:编码模块、解码模块、单词生成模块、计算模块和确认模块。
编码模块,用于将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量。
解码模块,用于将特征向量输入解码网络,得到每个时刻的隐藏状态。
单词生成模块,用于根据每个时刻的隐藏状态生成当前时刻所对应的单词。
计算模块,用于根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值。
确认模块,用于判断第一损失函数值是否达到预设值时,并在达到设定值时结束第一阶段的训练。
在一个实施例中,提供一种摘要描述生成装置,如图8所示,包括信息 获取模块802和预测模块804。
信息获取模块802,用于获取输入信息。
预测模块804,用于将输入信息输入预先训练好的摘要描述模型,通过摘要描述模型的编码网络,得到输入信息的特征向量,通过摘要描述模型的解码网络对特征向量进行解码,生成输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的摘要描述模型.
具体地,摘要描述生成装置还包括上述各实施例中的摘要描述模型训练装置的各模块的结构,此处不再赘述。
利用该摘要描述生成装置,可用于文本数据、图像数据或视频进行预测,生成描述语句。对于图像生成的描述,可以用于图像的场景分类,如对用户相册中的图像自动总结归类;也有助于图像检索服务;以及帮助视觉障碍者理解图像。对于文本笔记数据,该技术可以用于描述该段文本的含义,可以进一步服务于文本的分类与挖掘。
图9示出了一个实施例中计算机设备的内部结构图。如图9所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现摘要描述模型训练方法或摘要描述生成方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行摘要描述模型训练方法或摘要描述生成方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相 关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的摘要描述模型训练装置可以实现为一种计算机程序的形式,计算机程序可在如图9所示的计算机设备上运行。计算机设备的存储器中可存储组成该摘要描述模型训练装置的各个程序模块,比如,图7所示的输入模块、第一阶段训练模块和反推模块,图8所示的信息获取模块和预测模块。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例摘要描述模型训练中的步骤。
例如,图9所示的计算机设备可以通过如图7所示的摘要描述模型训练装置中的输入模块执行将带标注的训练样本输入摘要描述模型的步骤。计算机设备可通过第一阶段训练模块执行基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行第一阶段的训练的步骤。计算机设备可通过反推模块执行根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
又例如,图9的计算机设备可通过如图8所示的摘要描述生成装置中的输入模块执行获取输入信息的步骤,通过预测模块执行将输入信息输入预先训练好的摘要描述模型,通过摘要描述模型的编码网络,得到输入信息的特征向量,通过摘要描述模型的解码网络对特征向量进行解码,生成输入信息的摘要描述的步骤。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
获取输入信息;
将输入信息输入预先训练好的摘要描述模型,通过摘要描述模型的编码网络,得到输入信息的特征向量,通过摘要描述模型的解码网络对特征向量进行解码,生成输入信息的摘要描述;其中,预先基于第一损失函数的监督, 对编码网络和解码网络进行训练,并根据编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的摘要描述模型。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行训练;编码网络得到训练样本的特征向量,解码网络采用递归神经网络,对特征向量进行解码,得到各当前时刻的隐藏状态;
根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步 骤:利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量;
将特征向量输入解码网络,得到每个时刻的隐藏状态;
根据每个时刻的隐藏状态生成当前时刻所对应的单词;
根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
当第一损失函数值达到预设值时,得到摘要描述模型的初步参数。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行训练;编码网络得到训练样本的特征向量,解码网络采用递归神经网络,对特征向量进行解码,得到各当前时刻的隐藏状态;
根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量;
将特征向量输入解码网络,得到每个时刻的隐藏状态;
根据每个时刻的隐藏状态生成当前时刻所对应的单词;
根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
当第一损失函数值达到预设值时,得到摘要描述模型的初步参数。
在一实施例中,提供一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
获取输入信息;
将输入信息输入预先训练好的摘要描述模型,通过摘要描述模型的编码网络,得到输入信息的特征向量,通过摘要描述模型的解码网络对特征向量进行解码,生成输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据上一时刻反推隐藏状态和解码网 络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的摘要描述模型。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行训练;编码网络得到训练样本的特征向量,解码网络采用递归神经网络,对特征向量进行解码,得到各当前时刻的隐藏状态;
根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量;
将特征向量输入解码网络,得到每个时刻的隐藏状态;
根据每个时刻的隐藏状态生成当前时刻所对应的单词;
根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
当第一损失函数值达到预设值时,得到摘要描述模型的初步参数。
在一实施例中,提供一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入摘要描述模型;
基于第一损失函数的监督,对摘要描述模型的编码网络和解码网络进行训练;编码网络得到训练样本的特征向量,解码网络采用递归神经网络,对特征向量进行解码,得到各当前时刻的隐藏状态;
根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
根据上一时刻反推隐藏状态和解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
当第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:将解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐 藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:利用反向传播算法,根据解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
将带标注的训练样本输入编码网络,提取训练信息的特征,得到各训练信息的特征向量;
将特征向量输入解码网络,得到每个时刻的隐藏状态;
根据每个时刻的隐藏状态生成当前时刻所对应的单词;
根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
当第一损失函数值达到预设值时,得到摘要描述模型的初步参数。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线 (Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种摘要描述生成方法,所述方法由计算机设备实施,包括:
    获取输入信息;
    将所述输入信息输入预先训练好的摘要描述模型,通过所述摘要描述模型的编码网络,得到所述输入信息的特征向量,通过所述摘要描述模型的解码网络对所述特征向量进行解码,生成所述输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据所述编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的所述摘要描述模型。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将带标注的训练样本输入摘要描述模型;
    基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
    根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
    根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
    当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态的步骤,包括:将所述解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上 一时刻反推隐藏状态的步骤,包括:
    将所述解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
    将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
  5. 根据权利要求2所述的方法,其特征在于,所述根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态的步骤,包括:利用反向传播算法,根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
  6. 根据权利要求2所述的方法,其特征在于,所述基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练的步骤,包括:
    将带标注的训练样本输入编码网络,提取所述训练信息的特征,得到各训练信息的特征向量;
    将所述特征向量输入解码网络,得到每个时刻的隐藏状态;
    根据每个时刻的隐藏状态生成当前时刻所对应的单词;
    根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
    当所述第一损失函数值达到预设值时,得到所述摘要描述模型的初步参数。
  7. 一种摘要描述模型训练方法,所述方法由计算机设备实施,包括:
    将带标注的训练样本输入摘要描述模型;
    基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
    根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻 反推隐藏状态;
    根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
    当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态的步骤,包括:将所述解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
  9. 根据权利要求8所述的方法,其特征在于,所述将所述解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态的步骤,包括:
    将所述解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
    将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
  10. 根据权利要求7所述的方法,其特征在于,所述根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态的步骤,包括:利用反向传播算法,根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
  11. 根据权利要求7所述的方法,其特征在于,所述基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练的步骤,包括:
    将带标注的训练样本输入编码网络,提取所述训练信息的特征,得到各训练信息的特征向量;
    将所述特征向量输入解码网络,得到每个时刻的隐藏状态;
    根据每个时刻的隐藏状态生成当前时刻所对应的单词;
    根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
    当所述第一损失函数值达到预设值时,得到所述摘要描述模型的初步参数。
  12. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    获取输入信息;
    将所述输入信息输入预先训练好的摘要描述模型,通过所述摘要描述模型的编码网络,得到所述输入信息的特征向量,通过所述摘要描述模型的解码网络对所述特征向量进行解码,生成所述输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据所述编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的所述摘要描述模型。
  13. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    将带标注的训练样本输入摘要描述模型;
    基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
    根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
    根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
    当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:将所述解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    将所述解码网络输出的每一当前时刻的隐藏状态和重构网络中上一时刻的隐藏状态,输入至重构网络当前时刻的LSTM隐含层,得到重构网络中当前时刻的隐藏状态;
    将重构网络中当前时刻的隐藏状态输入全连接层,得到上一时刻反推隐藏状态。
  16. 根据权利要求13所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:利用反向传播算法,根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态。
  17. 根据权利要求13所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    将带标注的训练样本输入编码网络,提取所述训练信息的特征,得到各训练信息的特征向量;
    将所述特征向量输入解码网络,得到每个时刻的隐藏状态;
    根据每个时刻的隐藏状态生成当前时刻所对应的单词;
    根据生成的当前时刻所对应的单词与标注的当前时刻的实际单词,得到第一损失函数值;
    当所述第一损失函数值达到预设值时,得到所述摘要描述模型的初步参数。
  18. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
    获取输入信息;
    将所述输入信息输入预先训练好的摘要描述模型,通过所述摘要描述模型的编码网络,得到所述输入信息的特征向量,通过所述摘要描述模型的解码网络对所述特征向量进行解码,生成所述输入信息的摘要描述;其中,预先基于第一损失函数的监督,对编码网络和解码网络进行训练,并根据所述编码网络输出的每一时刻的隐藏状态,反向推导上一时刻反推隐藏状态,根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到基于第二损失函数的监督确定的所述摘要描述模型。
  19. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
    将带标注的训练样本输入摘要描述模型;
    基于第一损失函数的监督,对所述摘要描述模型的编码网络和解码网络进行训练;所述编码网络得到所述训练样本的特征向量,所述解码网络采用递归神经网络,对所述特征向量进行解码,得到各当前时刻的隐藏状态;
    根据所述解码网络输出的每一时刻的隐藏状态,反向推导得到上一时刻反推隐藏状态;
    根据所述上一时刻反推隐藏状态和所述解码网络输出的上一时刻实际隐藏状态,得到第二损失函数值;
    当所述第二损失函数值达到预设值时,得到基于第二损失函数的监督确定的摘要描述模型的最终参数。
  20. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:将所述解码网络输出的每一时刻的隐藏状态,作为重构网络各对应时刻的输入,反向推导得到上一时刻反推隐藏状态。
PCT/CN2018/111709 2017-11-30 2018-10-24 摘要描述生成方法、摘要描述模型训练方法和计算机设备 WO2019105157A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18883654.8A EP3683725A4 (en) 2017-11-30 2018-10-24 METHOD OF GENERATION ABSTRACT DESCRIPTION, METHOD OF TRAINING ABSTRACT DESCRIPTION MODEL AND COMPUTER DEVICE
US16/685,702 US11494658B2 (en) 2017-11-30 2019-11-15 Summary generation method, summary generation model training method, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711243949.4 2017-11-30
CN201711243949.4A CN108334889B (zh) 2017-11-30 2017-11-30 摘要描述生成方法和装置、摘要描述模型训练方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/685,702 Continuation US11494658B2 (en) 2017-11-30 2019-11-15 Summary generation method, summary generation model training method, and computer device

Publications (1)

Publication Number Publication Date
WO2019105157A1 true WO2019105157A1 (zh) 2019-06-06

Family

ID=62922551

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111709 WO2019105157A1 (zh) 2017-11-30 2018-10-24 摘要描述生成方法、摘要描述模型训练方法和计算机设备

Country Status (4)

Country Link
US (1) US11494658B2 (zh)
EP (1) EP3683725A4 (zh)
CN (2) CN108334889B (zh)
WO (1) WO2019105157A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008224A (zh) * 2019-11-13 2020-04-14 浙江大学 一种基于深度多任务表示学习的时间序列分类和检索方法
CN111046771A (zh) * 2019-12-05 2020-04-21 上海眼控科技股份有限公司 用于恢复书写轨迹的网络模型的训练方法
CN111310823A (zh) * 2020-02-12 2020-06-19 北京迈格威科技有限公司 目标分类方法、装置和电子系统
CN111476357A (zh) * 2020-05-12 2020-07-31 中国人民解放军国防科技大学 基于三重融合卷积gru的共享单车需求预测方法
CN112529104A (zh) * 2020-12-23 2021-03-19 东软睿驰汽车技术(沈阳)有限公司 一种车辆故障预测模型生成方法、故障预测方法及装置
CN113139609A (zh) * 2021-04-29 2021-07-20 平安普惠企业管理有限公司 基于闭环反馈的模型校正方法、装置和计算机设备

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334889B (zh) * 2017-11-30 2020-04-03 腾讯科技(深圳)有限公司 摘要描述生成方法和装置、摘要描述模型训练方法和装置
CN109447813A (zh) * 2018-09-05 2019-03-08 平安科技(深圳)有限公司 产品推荐方法、装置、计算机设备和存储介质
CN109800327A (zh) * 2018-12-04 2019-05-24 天津大学 一种基于多跳注意力的视频摘要方法
CN109743642B (zh) * 2018-12-21 2020-07-03 西北工业大学 基于分层循环神经网络的视频摘要生成方法
CN111368966B (zh) * 2018-12-25 2023-11-21 北京嘀嘀无限科技发展有限公司 工单描述生成方法、装置、电子设备及可读存储介质
CN109948691B (zh) * 2019-03-14 2022-02-18 齐鲁工业大学 基于深度残差网络及注意力的图像描述生成方法和装置
CN111723194A (zh) * 2019-03-18 2020-09-29 阿里巴巴集团控股有限公司 摘要生成方法、装置和设备
CN109978139B (zh) * 2019-03-20 2021-06-04 深圳大学 图片自动生成描述的方法、系统、电子装置及存储介质
CN110110140A (zh) * 2019-04-19 2019-08-09 天津大学 基于注意力扩展编解码网络的视频摘要方法
CN111046422B (zh) * 2019-12-09 2021-03-12 支付宝(杭州)信息技术有限公司 防止隐私数据泄漏的编码模型训练方法及装置
CN111026861B (zh) * 2019-12-10 2023-07-04 腾讯科技(深圳)有限公司 文本摘要的生成方法、训练方法、装置、设备及介质
CN111177461A (zh) * 2019-12-30 2020-05-19 厦门大学 一种根据当前场景及其描述信息生成下一场景的方法
CN111488807B (zh) * 2020-03-29 2023-10-10 复旦大学 基于图卷积网络的视频描述生成系统
CN111563917A (zh) * 2020-05-18 2020-08-21 南京大学 一种基于隐式特征编码的云层运动预测方法及系统
CN111444731B (zh) * 2020-06-15 2020-11-03 深圳市友杰智新科技有限公司 模型训练方法、装置和计算机设备
CN111833583B (zh) * 2020-07-14 2021-09-03 南方电网科学研究院有限责任公司 电力数据异常检测模型的训练方法、装置、设备和介质
CN112086144A (zh) * 2020-08-28 2020-12-15 深圳先进技术研究院 分子生成方法、装置、电子设备及存储介质
CN114692652A (zh) * 2020-12-31 2022-07-01 北京金山数字娱乐科技有限公司 翻译模型训练方法及装置、翻译方法及装置
CN113011555B (zh) * 2021-02-09 2023-01-31 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备及存储介质
CN113011582B (zh) * 2021-03-03 2023-12-12 上海科技大学 基于自审视的多单元反馈网络模型、设备和存储介质
CN113033538B (zh) * 2021-03-25 2024-05-10 北京搜狗科技发展有限公司 一种公式识别方法及装置
CN113449079B (zh) * 2021-06-25 2022-10-04 平安科技(深圳)有限公司 文本摘要生成方法、装置、电子设备及存储介质
CN113590810B (zh) * 2021-08-03 2023-07-14 北京奇艺世纪科技有限公司 摘要生成模型训练方法、摘要生成方法、装置及电子设备
CN113815679B (zh) * 2021-08-27 2023-01-13 北京交通大学 一种高速列车自主驾驶控制的实现方法
CN113970697B (zh) * 2021-09-09 2023-06-13 北京无线电计量测试研究所 一种模拟电路状态评估方法和装置
CN114900441B (zh) * 2022-04-29 2024-04-26 华为技术有限公司 网络性能预测方法,性能预测模型训练方法及相关装置
CN115410212B (zh) * 2022-11-02 2023-02-07 平安科技(深圳)有限公司 多模态模型的训练方法、装置、计算机设备及存储介质
CN116775497B (zh) * 2023-08-17 2023-11-14 北京遥感设备研究所 数据库测试用例生成需求描述编码方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232440A1 (en) * 2015-02-06 2016-08-11 Google Inc. Recurrent neural networks for data item generation
CN107038221A (zh) * 2017-03-22 2017-08-11 杭州电子科技大学 一种基于语义信息引导的视频内容描述方法
CN107066973A (zh) * 2017-04-17 2017-08-18 杭州电子科技大学 一种利用时空注意力模型的视频内容描述方法
CN108334889A (zh) * 2017-11-30 2018-07-27 腾讯科技(深圳)有限公司 摘要描述生成方法和装置、摘要描述模型训练方法和装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060122663A (ko) * 2005-05-26 2006-11-30 엘지전자 주식회사 영상신호의 엔코딩 및 디코딩에서의 픽처 정보를 전송하고이를 이용하는 방법
US9577992B2 (en) * 2015-02-04 2017-02-21 Aerendir Mobile Inc. Data encryption/decryption using neuro and neuro-mechanical fingerprints
US9495633B2 (en) * 2015-04-16 2016-11-15 Cylance, Inc. Recurrent neural networks for malware analysis
US10319374B2 (en) * 2015-11-25 2019-06-11 Baidu USA, LLC Deployed end-to-end speech recognition
US9792534B2 (en) * 2016-01-13 2017-10-17 Adobe Systems Incorporated Semantic natural language vector space
US10235994B2 (en) * 2016-03-04 2019-03-19 Microsoft Technology Licensing, Llc Modular deep learning model
CN106778700A (zh) * 2017-01-22 2017-05-31 福州大学 一种基于変分编码器中国手语识别方法
CN106933785A (zh) * 2017-02-23 2017-07-07 中山大学 一种基于递归神经网络的摘要生成方法
CN106980683B (zh) * 2017-03-30 2021-02-12 中国科学技术大学苏州研究院 基于深度学习的博客文本摘要生成方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232440A1 (en) * 2015-02-06 2016-08-11 Google Inc. Recurrent neural networks for data item generation
CN107038221A (zh) * 2017-03-22 2017-08-11 杭州电子科技大学 一种基于语义信息引导的视频内容描述方法
CN107066973A (zh) * 2017-04-17 2017-08-18 杭州电子科技大学 一种利用时空注意力模型的视频内容描述方法
CN108334889A (zh) * 2017-11-30 2018-07-27 腾讯科技(深圳)有限公司 摘要描述生成方法和装置、摘要描述模型训练方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3683725A4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008224A (zh) * 2019-11-13 2020-04-14 浙江大学 一种基于深度多任务表示学习的时间序列分类和检索方法
CN111008224B (zh) * 2019-11-13 2023-10-27 浙江大学 一种基于深度多任务表示学习的时间序列分类和检索方法
CN111046771A (zh) * 2019-12-05 2020-04-21 上海眼控科技股份有限公司 用于恢复书写轨迹的网络模型的训练方法
CN111310823A (zh) * 2020-02-12 2020-06-19 北京迈格威科技有限公司 目标分类方法、装置和电子系统
CN111310823B (zh) * 2020-02-12 2024-03-29 北京迈格威科技有限公司 目标分类方法、装置和电子系统
CN111476357A (zh) * 2020-05-12 2020-07-31 中国人民解放军国防科技大学 基于三重融合卷积gru的共享单车需求预测方法
CN111476357B (zh) * 2020-05-12 2022-05-31 中国人民解放军国防科技大学 基于三重融合卷积gru的共享单车需求预测方法
CN112529104A (zh) * 2020-12-23 2021-03-19 东软睿驰汽车技术(沈阳)有限公司 一种车辆故障预测模型生成方法、故障预测方法及装置
CN113139609A (zh) * 2021-04-29 2021-07-20 平安普惠企业管理有限公司 基于闭环反馈的模型校正方法、装置和计算机设备
CN113139609B (zh) * 2021-04-29 2023-12-29 国网甘肃省电力公司白银供电公司 基于闭环反馈的模型校正方法、装置和计算机设备

Also Published As

Publication number Publication date
EP3683725A1 (en) 2020-07-22
CN110598779B (zh) 2022-04-08
CN110598779A (zh) 2019-12-20
CN108334889B (zh) 2020-04-03
EP3683725A4 (en) 2022-04-20
CN108334889A (zh) 2018-07-27
US20200082271A1 (en) 2020-03-12
US11494658B2 (en) 2022-11-08

Similar Documents

Publication Publication Date Title
WO2019105157A1 (zh) 摘要描述生成方法、摘要描述模型训练方法和计算机设备
US11853709B2 (en) Text translation method and apparatus, storage medium, and computer device
US11568207B2 (en) Learning observation representations by predicting the future in latent space
US11755885B2 (en) Joint learning of local and global features for entity linking via neural networks
EP3054403B1 (en) Recurrent neural networks for data item generation
US11263250B2 (en) Method and system for analyzing entities
Chen et al. End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding.
CN110597970B (zh) 一种多粒度医疗实体联合识别的方法及装置
CN107066464B (zh) 语义自然语言向量空间
WO2022199504A1 (zh) 内容识别方法、装置、计算机设备和存储介质
CN111859911B (zh) 图像描述文本生成方法、装置、计算机设备及存储介质
GB2546360A (en) Image captioning with weak supervision
CN109919221B (zh) 基于双向双注意力机制图像描述方法
CN110807335B (zh) 基于机器学习的翻译方法、装置、设备及存储介质
WO2020211611A1 (zh) 用于语言处理的循环神经网络中隐状态的生成方法和装置
CN112131886A (zh) 一种文本的方面级别情感分析方法
US20210089829A1 (en) Augmenting attentioned-based neural networks to selectively attend to past inputs
WO2020118408A1 (en) Regularization of recurrent machine-learned architectures
JP2019079088A (ja) 学習装置、プログラムパラメータおよび学習方法
CN113434652B (zh) 智能问答方法、智能问答装置、设备及存储介质
CN113128180A (zh) 文本生成方法和设备
Chien et al. Sequential learning and regularization in variational recurrent autoencoder
US20240137042A1 (en) Coding apparatuses, and data processing methods and apparatueses
Kaushik et al. Context Bucketed Text Responses using Generative Adversarial Neural Network in Android Application with Tens or Flow-Lite Framework
CN117671680A (zh) 基于图像的生成方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883654

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE