WO2020108545A1 - 语句处理方法、语句解码方法、装置、存储介质及设备 - Google Patents

语句处理方法、语句解码方法、装置、存储介质及设备 Download PDF

Info

Publication number
WO2020108545A1
WO2020108545A1 PCT/CN2019/121420 CN2019121420W WO2020108545A1 WO 2020108545 A1 WO2020108545 A1 WO 2020108545A1 CN 2019121420 W CN2019121420 W CN 2019121420W WO 2020108545 A1 WO2020108545 A1 WO 2020108545A1
Authority
WO
WIPO (PCT)
Prior art keywords
vocabulary
jth
unit
query
sentence
Prior art date
Application number
PCT/CN2019/121420
Other languages
English (en)
French (fr)
Inventor
孟凡东
张金超
周杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2021516821A priority Critical patent/JP7229345B2/ja
Publication of WO2020108545A1 publication Critical patent/WO2020108545A1/zh
Priority to US17/181,490 priority patent/US20210174003A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • Embodiments of the present application relate to the field of sentence processing, and in particular, to a sentence processing method, sentence decoding method, device, storage medium, and equipment.
  • the computer can process one input sentence and output another sentence.
  • machine translation refers to a way of translating sentences from one natural language into sentences of another natural language through a computer.
  • machine translation is to translate sentences through a trained machine learning model. For example, after a user enters the Chinese sentence "continuous growth in housing prices" into the machine learning model, the machine learning model outputs the English sentence "The housing prices continued" to "rise.”
  • the machine learning model includes an encoding model and a decoding model.
  • the encoding model is used to encode a source sentence of a natural language input into a sentence vector, and the sentence vector is output to the decoding model; the decoding model is used to Decode the sentence vector into another natural language target sentence.
  • both the encoding model and the decoding model are composed of neural network models. At present, the accuracy of sentence processing by sentence processing models is low.
  • a sentence processing method, sentence decoding method, device, storage medium, and device are provided.
  • a sentence processing method which is executed by a sentence processing device and used in a coding model, the coding model includes n processing nodes cascaded, and the processing node includes a first unit cascaded and at least A second unit, n ⁇ 2; including:
  • the i -1 vocabulary vectors are coding vectors of the i-1th vocabulary of the m vocabularies, i ⁇ m;
  • Linear operation and non-linear operation are performed on the i-th vocabulary and the i-1th vocabulary vector by using the first unit in the i-th processing node, and the obtained i-th operation result is output to the At least one second unit processes to obtain the i-th vocabulary vector;
  • a sentence vector is generated according to the m vocabulary vectors, and the sentence vector is used to determine a target sentence or a target classification.
  • a sentence decoding method which is executed by a sentence processing device and used in a decoding model, the decoding model includes a processing node, and the processing node includes a cascaded first unit and at least one second unit ;
  • the method includes:
  • the statement vector and the jth query state are obtained.
  • the sentence vector is obtained after the source model to be encoded is encoded by the coding model.
  • the jth query state is used to query the jth moment. Describe the coded part of the source sentence;
  • the jth source language attention context is the coded part of the source sentence at the jth time
  • the first unit in the processing node performs linear and non-linear operations on the jth query state and the jth language attention context, and outputs the obtained jth operation result to the processing node At least one second unit of the process to get the jth vocabulary;
  • a target sentence is generated according to the k words, j ⁇ k.
  • a sentence processing device for use in a coding model, the coding model including n cascaded processing nodes, the processing node including a cascaded first unit and at least one second unit, n ⁇ 2; including:
  • the word segmentation module is used to perform word segmentation on the source sentence to be encoded to obtain m vocabularies, m ⁇ n;
  • An obtaining module configured to obtain the i-th vocabulary of the m vocabularies by using the i-th processing node of the n processing nodes, and obtain the i-1-th vocabulary vector obtained by the i-1 processing node ,
  • the i-1th vocabulary vector is the coding vector of the i-1th vocabulary among the m vocabularies, i ⁇ m;
  • An operation module for performing linear operation and non-linear operation on the i-th vocabulary and the i-1th vocabulary vector by using the first unit in the i-th processing node to convert the obtained i-th operation
  • the result is output to the at least one second unit for processing to obtain the i-th vocabulary vector
  • the generating module is configured to generate sentence vectors according to the m vocabulary vectors after obtaining m vocabulary vectors, and the sentence vectors are used to determine a target sentence or a target classification.
  • a sentence decoding apparatus for use in a decoding model, where the decoding model includes a processing node, and the processing node includes a cascaded first unit and at least one second unit; including:
  • the acquisition module is used to acquire the sentence vector and the jth query state at the jth moment, the sentence vector is obtained after the source model to be encoded is encoded by the coding model, and the jth query state is used to query the the coded part of the source sentence at j times;
  • a generating module configured to generate a jth source language attention context based on the statement vector and the jth query state, the jth source language attention context is a coded part of the source sentence at the jth time ;
  • the generating module is also used to generate a target sentence according to the k vocabularies after obtaining k vocabularies, j ⁇ k.
  • one or more non-volatile storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to execute as described above Sentence processing method, or execute at least one of the sentence decoding methods described above.
  • a sentence processing device which includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor is executed as described above The sentence processing method mentioned above, or execute at least one of the sentence decoding methods described above.
  • Fig. 1 is a schematic structural diagram of a sentence processing system according to some exemplary embodiments
  • FIG. 2 is a schematic diagram of an encoding model provided by some embodiments of the present application.
  • FIG. 3 is a method flowchart of a sentence processing method provided by some embodiments of the present application.
  • FIG. 4 is a schematic diagram of an encoding model provided by some embodiments of the present application.
  • FIG. 5 is a schematic diagram of an encoding model provided by some embodiments of the present application.
  • FIG. 6 is a method flowchart of a sentence processing method provided by some embodiments of the present application.
  • FIG. 9 is a schematic diagram of a decoding model provided by some embodiments of the present application.
  • FIG. 10 is a schematic diagram of a decoding model provided by some embodiments of the present application.
  • FIG. 11 is a schematic diagram of a decoding model provided by some embodiments of the present application.
  • FIG. 12 is a schematic diagram of a decoding model and a decoding model provided by some embodiments of the present application.
  • FIG. 13 is a structural block diagram of a sentence processing apparatus provided by some embodiments of the present application.
  • FIG. 14 is a structural block diagram of a sentence decoding apparatus provided by some embodiments of the present application.
  • 15 is a structural block diagram of a server provided by still another embodiment of the present application.
  • AI Artificial Intelligence
  • Artificial Intelligence is a theory, method, technology, and application system that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject, covering a wide range of fields, both hardware-level technology and software-level technology.
  • Basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology and machine learning/deep learning.
  • Speech Technology Automatic Speech Recognition Technology (ASR) and Speech Synthesis Technology (TTS) and Voiceprint Recognition Technology. Letting the computer listen, see, speak, and feel is the development direction of human-computer interaction in the future, and voice becomes one of the most favored human-computer interaction methods in the future.
  • ASR Automatic Speech Recognition Technology
  • TTS Speech Synthesis Technology
  • Voiceprint Recognition Technology Letting the computer listen, see, speak, and feel is the development direction of human-computer interaction in the future, and voice becomes one of the most favored human-computer interaction methods in the future.
  • Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language people use every day, so it has a close relationship with the study of linguistics. Natural language processing technologies usually include text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
  • Machine learning is a multidisciplinary interdisciplinary subject, involving multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and so on. Specially study how the computer simulates or realizes human learning behavior to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its performance.
  • Machine learning is the core of artificial intelligence, and is the fundamental way to make computers intelligent, and its applications are in various fields of artificial intelligence.
  • Machine learning and deep learning usually include technologies such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, inductive learning, and pedagogical learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robots, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
  • This application mainly relates to two types of application scenarios.
  • the machine learning model In the first type of application scenarios, the machine learning model generates statements based on the statements. In the second type of application scenarios, the machine learning model classifies the statements.
  • the two types of application scenarios are described below.
  • the first type of application scenarios may include multiple application scenarios.
  • the following three application scenarios are illustrated by machine translation, human-machine dialogue, and automatic text generation.
  • Machine translation refers to a translation method of translating a sentence in one natural language into a sentence in another natural language through a computer.
  • this machine translation translates sentences through a trained machine learning model.
  • the machine learning model is trained through a large number of translated corpus samples.
  • the translated corpus samples include the correspondence between multiple sets of Chinese corpus and English corpus. Each Chinese corpus corresponds to an English corpus as a translation result.
  • the English translation “The housing prices continued" to "rise” is output.
  • the following describes the application scenarios of machine translation based on the calling method of the machine learning model.
  • the entrance to a machine learning model is an input box.
  • the machine learning model is set in an application program, and an input box is provided in the application program.
  • the user can find the input box in the application, enter sentence A in the input box, and the machine learning model uses the sentence A as the source sentence to be translated.
  • the sentence A may be manually input by the user or copied from other text, which is not limited in this embodiment.
  • the machine learning model is set in the server, and an input box is provided on the webpage corresponding to the server.
  • the user can start the browser to open the webpage, find the input box in the webpage, enter sentence A in the input box, the browser sends the sentence A to the server, the server
  • the machine learning model uses this sentence A as the source sentence to be translated.
  • the sentence A may be manually input by the user or copied from other text, which is not limited in this embodiment.
  • the entrance of the machine learning model is not visible to the user.
  • the machine learning model is embedded in an application, or the machine learning model can be called by an application.
  • the application when a user browses a text using the application, the statement A in the text is selected, and then the application displays an operation option for the statement A, if the operation option includes a translation option .
  • the application calls the machine learning model, and sends the sentence A to the machine learning model for translation.
  • Man-machine dialogue refers to a dialogue mode in which the computer answers the sentence input by the user.
  • the human-machine dialogue responds to sentences through a trained machine learning model.
  • the machine learning model is trained through a large number of dialogue samples, which include multiple sets of dialogues of the same natural language or different natural sentences.
  • the user inputs the sentence "how many days before the Spring Festival" into the machine learning model, and outputs a response sentence "60 days before the Spring Festival".
  • Automatic text generation refers to the text generation method of writing a sentence or a sentence according to a sentence by the computer.
  • the number of words in the input sentence is greater than the number of words in the output sentence, it can be understood that the content of the input sentence is extracted, which can be applied to application scenarios such as abstract extraction; when the number of words in the input sentence is less than the output sentence
  • the number of words can be understood as the content expansion of the input sentence, which can be applied to application scenarios such as sentence rewriting and article generation.
  • the text is automatically generated by the trained machine learning model.
  • the text is automatically generated by the trained machine learning model.
  • a text about flower promotion is output.
  • the second type of application scenarios may include multiple application scenarios.
  • the following uses the three application scenarios of sentiment analysis, part-of-speech analysis, and entity analysis as examples.
  • Sentiment analysis refers to the classification method of analyzing the user's emotions according to the sentences through the computer.
  • the emotions mentioned here can include emotions such as sadness and happiness, moods such as depression and burnout, such as indifference and alienation
  • emotions such as sadness and happiness
  • moods such as depression and burnout
  • indifference and alienation The interpersonal standpoints, attitudes such as likes and dislikes, etc., are not limited in this embodiment.
  • the sentiment analysis analyzes sentences through a trained machine learning model.
  • the machine learning model is trained through a large number of sentiment analysis samples.
  • the sentiment analysis samples include the correspondence between multiple sets of sentences and emotions, and each sentence corresponds to a sentiment.
  • the user inputs the sentence "I am very happy" into the machine learning model, and outputs the classification result of "happy".
  • Part-of-speech analysis refers to a classification method that analyzes the part-of-speech of the vocabulary in the sentence through the computer.
  • the part-of-speech mentioned here may include verbs, nouns, adjectives, prepositions, adverbs, etc. This embodiment is not limited.
  • the part-of-speech analysis analyzes sentences through a trained machine learning model.
  • the machine learning model is trained through a large number of part-of-speech analysis samples.
  • the part-of-speech analysis sample includes correspondence between multiple sets of sentences and parts of speech, and one vocabulary in each sentence corresponds to one part of speech.
  • the user inputs the sentence "I am very happy” into the machine learning model, and outputs the classification results of "I" belonging to the noun classification, "Very” belonging to the adverb classification, and "Happy" belonging to the adjective classification.
  • Entity analysis refers to the way of categorizing named entities in sentences extracted by computer.
  • the named entities mentioned here can include people's names, place names, organizations and so on.
  • the named entity analysis analyzes sentences through a trained machine learning model.
  • a large number of named entity analysis samples are used to train the machine learning model.
  • the named entity analysis samples include correspondences between multiple sets of sentences and named entities, one named entity in each sentence corresponds to one named entity.
  • the user inputs the sentence "I am in company” into the machine learning model, and outputs the classification result of "company”.
  • the above machine learning model may be implemented as a neural network model, a support vector machine (Support Vector Machine, SVM), a decision tree (Decision Tree, DT) and other models.
  • the machine learning model is an RNN (Recurrent Neural Network) model.
  • Coding model refers to a model that encodes a sentence in a natural language into a sentence vector.
  • the sentence vector is composed of a vocabulary vector corresponding to each vocabulary in the sentence.
  • the vocabulary vector represents a vector of a vocabulary in the sentence. The generation method of the vocabulary vector will be described below, which will not be repeated here.
  • a word segmentation operation needs to be performed on the sentence to obtain at least two vocabularies, and the word segmentation operation is not limited in this embodiment.
  • the vocabulary mentioned here is obtained by segmenting the sentence, which may be characters, words, sub-words, etc., which is not limited in this embodiment. Among them, the sub-words are obtained based on word segmentation. For example, the word “Peking University” is divided into two sub-words "Beijing" and "University".
  • Decoding model refers to a model that decodes a sentence vector into a natural language sentence. Among them, the decoding model decodes each sentence vector once to obtain a vocabulary, and all the obtained vocabularies form a sentence.
  • sentence vector [vocabulary vector 1, vocabulary vector 2, vocabulary vector 3], then the decoding model decodes the sentence vector for the first time to obtain the word "The”, and decodes the sentence vector for the second time, The word “housing” is obtained, and so on, until the sentence vector is decoded for the sixth time, and the word “rise” is obtained, and the obtained six words form the sentence "The housing” prices continued.
  • the embodiments of the present application can be implemented in a terminal, a server, or a terminal and a server.
  • the terminal 11 is used to generate a source sentence, and the source sentence Sent to the server 12, after processing the source sentence, the server 12 sends the processing result to the terminal 11 for display.
  • the terminal 11 and the server 12 are connected through a communication network, which may be a wired network or a wireless network, which is not limited in this embodiment of the present application.
  • the machine learning model for machine translation is stored in the server 12. After the user enters the source sentence to be translated “House prices continue to rise” in the terminal 11, the terminal 11 sends the source sentence to the server 12. The source sentence is translated through the machine learning model to obtain the target sentence, and the target sentence is sent to the terminal 11 for display.
  • the machine learning model in this application includes an encoding model and a decoding model.
  • the structure of the encoding model is first introduced below.
  • the coding model includes cascaded n processing nodes 201, and the processing node 201 includes a cascaded first unit and at least one second unit, the first unit may be cascaded with the first second unit
  • the processing node 201 may also include multiple first units, and the last first unit may be cascaded with the first second unit.
  • the first unit is indicated by a hatched frame
  • the second unit is indicated by a white frame.
  • each processing node 201 includes, in order: a first unit, a second unit, ..., a second unit. Among them, n ⁇ 2,
  • the first unit is a GRU with non-linear operation capability and linear operation capability, such as L-GRU or GRU that makes other linear transformation improvements to the GRU;
  • the second unit is T-GRU.
  • the following introduces GRU, L-GRU and T-GRU respectively.
  • GRU Gate Recurrent Unit, gated circulation unit
  • is the activation function
  • is the symbol of element product operation
  • the calculation formula is tanh is the hyperbolic tangent function
  • the update gating z i is used to measure the ratio of h i from x i and h i-1 .
  • T-GRU Transition GRU, transition gating cycle unit
  • T-GRU does not appear in the first layer of the machine learning model, so there is no input x i in T-GRU.
  • Update gated h i z i is a measure derived from the ratio of h, i-1. Update z gated larger the value i from the i-1 h larger scale; update gated z i is smaller the smaller the value of i-1 from the ratio h.
  • Reset gating r i for measurement From the ratio of hi-1 The larger the value of the reset gate g r i represents the smaller the ratio from hi i-1 ; the smaller the value of the update gate g i z represents the larger the ratio from hi i-1 .
  • L-GRU Linear Transformation GRU, linear conversion enhanced gated cycle unit
  • the update gating z i is used to measure the ratio of h i from x i and h i-1 .
  • Linear transformation gating l i is used to control candidate activation function values to include linear transformation function values.
  • the linear transformation gating l i is used to strengthen the candidate activation function value so that the candidate activation function value contains the linear transformation result of x i to a certain extent.
  • FIG. 3 shows a method flowchart of a sentence processing method provided by some embodiments of the present application.
  • the sentence processing method includes:
  • Step 301 Perform word segmentation on the source sentence to be encoded to obtain m words.
  • the source sentence refers to a sentence corresponding to a natural language.
  • the source sentence can be input by the user or selected by the user from the text.
  • the source sentence is a sentence to be translated input by the user, and optionally, the source sentence may also be generated by the user selecting while browsing the text For example, when the user selects the text content “continuously rising housing prices” when browsing the article, and selects the translation option, the selected text content is the source sentence.
  • word segmentation operation can be performed on the source sentence. This embodiment does not limit the operation method of word segmentation operation.
  • each vocabulary in the source sentence corresponds to a processing node, so the number m of vocabulary obtained by word segmentation needs to be less than or equal to the number n of processing nodes in the machine learning model, that is, m ⁇ n.
  • the coding model can perform steps 302 and 303, after obtaining a vocabulary vector, update i to i+1, continue to perform steps 302 and 303 to obtain the next vocabulary vector, and so on, until After i is updated to m to obtain the m-th vocabulary vector, the loop is stopped, and step 304 is executed.
  • Step 302 Use the i-th processing node of the n processing nodes to obtain the i-th vocabulary of the m vocabularies, and obtain the i-1th vocabulary vector obtained by the i-1th processing node.
  • the vocabulary vectors are the coding vectors of the i-1th vocabulary among the m vocabularies.
  • the m words are sorted according to the position of each word in the source sentence. For example, if the source sentence is “continuous increase in housing prices”, then the three sorted words are “house prices", "sustained” and "growth".
  • the i-1th vocabulary vector is generated based on the first i-2 vocabulary of the m vocabulary, but it represents the coding vector of the i-1th vocabulary.
  • h 1 in FIG. 2 is generated based on the first vocabulary, and it represents the first vocabulary vector
  • h 2 is generated based on the first and second vocabulary, and it represents the second vocabulary vector.
  • Step 303 Perform linear and non-linear operations on the i-th vocabulary and the i-1 vocabulary vector by using the first unit in the i-th processing node, and output the obtained i-th operation result to at least one second unit Perform processing to get the i-th vocabulary vector.
  • the process of obtaining a vocabulary vector at each moment in the coding model is introduced.
  • the time from receiving data to outputting data by the processing node is referred to as a moment, also referred to as a time step.
  • the L-GRU in the first processing node receives the first vocabulary x 1 in the source sentence, and performs linear operation and non-linear operation on x 1 according to the calculation formula and outputs it to the first processing node
  • the first T-GRU in the system the first T-GRU processes the received data according to the calculation formula, and outputs it to the second T-GRU in the first processing node, and so on, until the The last T-GRU in the first processing node processes the received data according to the calculation formula to obtain h 1 , where h 1 is the vocabulary vector corresponding to x 1 .
  • the L-GRU in the second processing node receives the second vocabulary x 2 in the source sentence and h 1 obtained by the first processing node, and performs linear operations on x 1 and h 1 according to the calculation formula And the non-linear operation is output to the first T-GRU in the second processing node.
  • the first T-GRU processes the received data according to the calculation formula and outputs it to the second processing node.
  • the nth processing node can get h m .
  • the number of T-GRUs in the processing node may be preset.
  • the number of T-GRUs has a positive correlation with the accuracy of sentence processing, that is, the greater the number of T-GRUs, the higher the accuracy of sentence processing.
  • the number of T-GRUs can be set according to the user's demand for accuracy and efficiency of sentence processing.
  • step 304 after obtaining m vocabulary vectors, a sentence vector is generated according to the m vocabulary vectors, and the sentence vector is used to determine a target sentence or a target classification.
  • the m vocabulary vectors are sorted according to the position of the vocabulary corresponding to each vocabulary vector in the source sentence. For example, "price” corresponds to vocabulary vector 1, "continuation” corresponds to vocabulary vector 2, and "growth” corresponds to vocabulary vector 3.
  • the resulting sentence vector [vocabulary vector 1, vocabulary vector 2, vocabulary vector 3].
  • the decoding model can be used to decode the sentence vector to obtain the target sentence or target classification.
  • the sentence vector is used for the decoding model to generate a target sentence
  • the target sentence refers to a sentence corresponding to a natural language.
  • the natural language corresponding to the source sentence and the natural language corresponding to the target sentence are different.
  • the natural language corresponding to the source sentence is Chinese
  • the natural language corresponding to the target sentence is English
  • the natural language corresponding to the source sentence is French
  • the natural language corresponding to the target sentence is English
  • the natural language corresponding to the source sentence is English
  • the target sentence corresponds to The natural language is Chinese.
  • the natural language corresponding to the source sentence and the natural language corresponding to the target sentence may be the same or different.
  • the sentence vector is used to determine the target classification.
  • the target classification is sentiment classification.
  • the target classification is part-of-speech classification.
  • the target classification is named entity classification.
  • the first unit in the processing node can perform linear and non-linear operations on the i-th vocabulary and the i-1 vocabulary vector, that is, the vocabulary of the current vocabulary can be determined according to the context Vector, so it can extract more accurate vocabulary vector.
  • the machine learning model needs to rely on the weights obtained by training when processing sentences, and training needs to involve a back propagation algorithm, that is, the error between the output and the reference result is reversely transmitted along the output path of the training data in order to facilitate According to the error, the weight is modified.
  • the gradient of the error in the machine learning model will decrease exponentially until it disappears during backpropagation, making the previous weights in the machine learning model update slower and the latter weights update faster, resulting in inaccurate training weights , which leads to lower accuracy of sentence processing, so when training the coding model to obtain the weight of the coding model, the first unit will also output linear and non-linear operations on the training data and output it like this ,
  • the error includes the error of the linear operation part and the nonlinear operation part. Since the gradient of the error of the linear operation part is constant, it can slow down the gradient of the entire error. It improves the problem that the weight of the coding model is inaccurate because the gradient of the entire error decreases exponentially until it disappears, thereby improving the accuracy of sentence processing.
  • the type of the coding model can also be set according to the coding direction of the coding model, three of which are described below.
  • the i-th processing node is the processing node 201 arranged in the i-th position among the n processing nodes in the order from the front; the i-th vocabulary is arranged in the m words in the order from the front The vocabulary at position i.
  • the first vocabulary processed by the first processing node 201 in the direction from left to right is “house prices”
  • the second vocabulary processed by the processing node 201 is “persistent”
  • the third vocabulary processed by the third processing node 201 is "up”.
  • the i-th processing node is the processing node 401 arranged in the i-th position among the n processing nodes in order from back to front; the i-th vocabulary is arranged in m words in the order from back to front The vocabulary of the i-th position in.
  • the first vocabulary processed by the first processing node 401 in the direction from right to left is “increasing” and the second
  • the second vocabulary processed by each processing node 401 is "persistent”
  • the third vocabulary processed by the third processing node 401 is "price”.
  • Two-way coding model and the coding direction includes from front to back and back to front;
  • the i-th processing node includes the processing node 501 arranged in the i-th position among the n processing nodes in the order from front to back, and the i-th processing node in the n processing nodes arranged in the order from back to front Processing node 502 at each position;
  • the i-th vocabulary includes the i-th vocabulary arranged in the m-th vocabulary in the order from front to back, and the i-th vocabulary arranged in the m-th vocabulary in the order from back to front Location vocabulary.
  • the first vocabulary processed by the first processing node 501 in the direction from left to right is “house prices” and the second The second vocabulary processed by the processing node 501 is "persistent”, the third vocabulary processed by the third processing node 501 is "up”; the first processing node 502 processed from the right to the left is the first Each vocabulary is "increasing”, the second vocabulary processed by the second processing node 502 is “persistent”, and the third vocabulary processed by the third processing node 502 is "house price”.
  • FIG. 6 illustrates a method flowchart of a sentence processing method provided by another embodiment of the present application.
  • This statement processing method includes:
  • Step 601 Perform word segmentation on the source sentence to be encoded to obtain m words.
  • the coding model can perform steps 602-607. After obtaining a vocabulary vector, update i to i+1, and continue to perform steps 602-607 to obtain the next vocabulary vector, and so on, until the After i is updated to m to obtain the m-th vocabulary vector, the loop is stopped, and step 608 is executed.
  • Step 602 Use the i-th processing node of the n processing nodes to obtain the i-th vocabulary of the m vocabularies, and obtain the i-1th vocabulary vector obtained by the i-1th processing node.
  • the vocabulary vectors are the coding vectors of the i-1th vocabulary among the m vocabularies.
  • step 603 the first unit is used to perform an element product operation on the i-1th vocabulary vector and the first difference value to obtain a first product.
  • the first difference is equal to the predetermined value minus the update gating of the first unit.
  • the predetermined value may be 1 or other values, which is not limited in this embodiment.
  • the update gating is used to measure the ratio of the i-th vocabulary vector from the i-th vocabulary to the i-1 vocabulary vector.
  • the calculation formula for updating the gating is detailed in the description in the L-GRU and will not be repeated here.
  • step 604 the first unit is used to linearly transform the ith vocabulary through a linear transformation function, and the obtained linear transformation function value and linear transformation gating are subjected to element product operation to obtain a second product;
  • the i-th vocabulary and the i-1th vocabulary vector are nonlinearly transformed, and the obtained hyperbolic tangent function value is added to the second product to obtain a candidate activation function value.
  • Linear transformation gating is used to control candidate activation function values to include linear transformation function values.
  • linear transformation gating please refer to the description in L-GRU, which will not be repeated here.
  • step 605 the first unit is used to perform an element product operation on the update gating and the candidate activation function value to obtain a third product.
  • step 606 the first unit is used to add the first product and the third product to obtain the ith operation result.
  • the process of processing the data in steps 603-606 is the process of processing the data by the L-GRU in a processing node according to the calculation formula. For details, see the calculation formula of the L-GRU above, which will not be repeated here.
  • step 607 the first unit in the i-th processing node is used to output the obtained i-th operation result to at least one second unit for processing to obtain the i-th vocabulary vector.
  • the L-GRU in the i-th processing node outputs the obtained i-th operation result to the first T-GRU in the i-th processing node, and the first T-GRU pair in the i-th processing node
  • the received data is processed and output to the second T-GRU in the i-th processing node, and so on, until the last T-GRU in the i-th processing node calculates the received data according to the calculation formula
  • the i-th vocabulary vector is obtained, and the i-th vocabulary vector is the vocabulary vector corresponding to the i-th vocabulary.
  • the output of the k-th T-GRU is 1 ⁇ k ⁇ l s .
  • the i-th vocabulary vector is If the coding model is a unidirectional coding model and the coding direction is from front to back, the i-th vocabulary vector is If the coding model is a unidirectional coding model and the coding direction is from back to front, the i-th vocabulary vector is If the coding model is a bidirectional coding model and the coding direction includes from front to back and back to front, the i-th vocabulary vector is among them
  • Step 608 After obtaining m vocabulary vectors, generate a sentence vector according to the m vocabulary vectors.
  • the sentence vector is used to generate a target sentence, and the target sentence and the source sentence correspond to different natural languages.
  • the first unit in the processing node can perform linear operation and non-linear operation on the i-th vocabulary and the i-1 vocabulary vector.
  • the first unit will also perform linear and nonlinear operations on the training data and output it.
  • the decoding model includes a processing node, and the processing node includes a cascaded first unit and at least one second unit.
  • the first unit is a GRU with non-linear computing capabilities and linear computing capabilities, such as the above-mentioned L-GRU or GRU that makes other linear transformation improvements to the GRU;
  • the second unit is T-GRU.
  • the sentence decoding method includes:
  • step 701 at the jth moment, a sentence vector and a jth query state are obtained.
  • the sentence vector is obtained after the source model to be coded is encoded by the coding model.
  • the jth query state is used to query the jth moment The coded part of the source sentence.
  • the sentence vector may be generated according to the source sentence by the coding model shown in FIGS. 2-6, or may be generated according to the source sentence by other coding models, which is not limited in this embodiment.
  • the source sentence refers to a sentence corresponding to a natural language.
  • the query status indicates the historical status that has been encoded to the current moment. It is used to query the source sentence to obtain the most likely encoded part of the source sentence at the next moment. This part can be a word, word, phrase, discontinuous segment, etc. The embodiment is not limited.
  • Step 702 Generate a jth source language attention context based on the sentence vector and the jth query state, where the jth source language attention context is the coded part of the source sentence at the jth time.
  • the decoding model can use the attention operation to generate the jth source language attention context for the statement vector and the jth query state.
  • the source language attention context is the most likely part of the source sentence at the current moment. See the description below for details .
  • Step 703 the first unit in the processing node is used to perform linear and non-linear operations on the jth query state and the jth language attention context, and the obtained jth operation result is output to at least one second in the processing node
  • the unit performs processing to get the jth vocabulary.
  • the L-GRU in the processing node can perform linear operations and non-linear operations on the jth query state and jth language attention context according to the calculation formula introduced above Linear operation, and output the obtained jth operation result to the first T-GRU in the processing node.
  • the first T-GRU in the processing node processes the received data and outputs it to the processing node
  • the decoding model can perform steps 701-703, after obtaining a vocabulary, update j to j+1, and continue to perform steps 701-703 to obtain the next vocabulary, and so on, Until the j is updated to k to obtain the kth vocabulary, the loop is stopped, and step 704 is executed.
  • step 704 after obtaining k vocabularies, a target sentence is generated according to the k vocabularies, and the target sentence and the source sentence correspond to different natural languages.
  • the k vocabularies are sorted according to the generation order of each vocabulary to obtain the target sentence. For example, the first vocabulary obtained by the decoding model is "The”, the second vocabulary is “housing”, the third vocabulary is “prices”, the fourth vocabulary is “continued”, and the fifth vocabulary is "to” , The sixth vocabulary is "rise”, then the target sentence is "The housing prices continued” to "rise.”
  • the natural language corresponding to the source sentence and the natural language corresponding to the target sentence are different.
  • the natural language corresponding to the source sentence is Chinese
  • the natural language corresponding to the target sentence is English
  • the natural language corresponding to the source sentence is French
  • the natural language corresponding to the target sentence is English
  • the natural language corresponding to the source sentence is English
  • the target sentence corresponds to The natural language is Chinese.
  • the natural language corresponding to the source sentence and the natural language corresponding to the target sentence may be the same or different, and this embodiment is not limited.
  • the first unit in the processing node can perform linear operation and non-linear operation on the jth query state and the jth source language focus context
  • the first unit will also perform linear operation and nonlinear operation on the training data and output it.
  • the error Including the error in the linear operation part and the non-linear operation part. Since the gradient of the error in the linear operation part is constant, it can slow down the gradient of the entire error gradient, which is improved because the gradient of the entire error decreases exponentially until it disappears. This leads to the problem of inaccurate weights of the decoding model, thereby improving the accuracy of sentence processing.
  • the decoding model includes one processing node, of course, it may also include multiple processing nodes, and each processing node may include cascaded One first unit and at least one second unit.
  • the first unit is a GRU with non-linear computing capabilities and linear computing capabilities, such as the above-mentioned L-GRU or GRU that makes other linear transformation improvements to the GRU;
  • the second unit is T-GRU.
  • the sentence decoding method includes:
  • step 801 at the jth moment, a sentence vector and a jth query state are obtained.
  • the sentence vector is obtained after the source model to be encoded is encoded by the coding model.
  • the jth query state is used to query the jth moment The coded part of the source sentence.
  • the decoding model may obtain the jth query state through a query node, which is connected to the processing node.
  • query nodes The three implementations of query nodes are described below.
  • the query node includes a first unit and at least one second unit
  • acquiring the jth query state includes: using the first unit in the query node to acquire the j-1 decoding state and the first j-1 words, the j-1 decoding state is obtained by the processing node according to the j-1 operation result, and the j-1 decoding state is used to determine the j-1 word; use the query node
  • the first unit in performs linear operation and non-linear operation on the j-1 decoding state and j-1 vocabulary, and outputs the obtained intermediate operation result to at least one second unit in the query node for processing to obtain the first j query status.
  • the first unit is L-GRU and the second unit is T-GRU
  • hatched boxes represent L-GRU
  • white boxes represent T-GRU
  • dashed boxes 901 represent processing nodes.
  • the query node is represented by a dotted box 902.
  • the L-GRU in the processing node can perform linear and nonlinear operations on the j-1th decoding state and the j-1th vocabulary according to the calculation formula introduced above, and output the obtained intermediate operation results to the processing node
  • the first T-GRU in the processing node the first T-GRU in the processing node processes the received data and outputs it to the second T-GRU in the processing node, and so on, until the processing node After the last T-GRU processes the received data according to the calculation formula, the j-th query status is obtained.
  • s represents the decoding state
  • y represents the vocabulary in the target sentence.
  • this application not only deepens the depth of the query node, that is, adds T-GRU to the query node, thereby improving the learning ability of the decoding model; also changes the GRU to L-GRU, thus The accuracy of the weights of the decoding model is improved to improve the accuracy of sentence processing.
  • the query node includes a first unit
  • the jth query state is obtained, including: using the first unit in the query node to obtain the j-1 decoding state and the j-1 vocabulary,
  • the j-1 decoding state is obtained by the processing node according to the j-1 operation result, and the j-1 decoding state is used to determine the j-1 vocabulary;
  • the first unit in the query node is used to The -1 decoding state and the j-1 vocabulary are linearly and non-linearly operated to obtain the jth query state.
  • the L-GRU in the processing node can perform linear operation and non-linear operation on the j-1th decoding state and the j-1th vocabulary according to the calculation formula introduced above to directly obtain the jth query state.
  • the modification of the GRU to L-GRU in this application improves the accuracy of the weight of the decoding model to improve the accuracy of sentence processing.
  • the query node includes a third unit and at least one second unit, and the jth query state is obtained, including: using the third unit in the query node to obtain the j-1 decoded state and the first j-1 words, the j-1 decoding state is obtained by the processing node according to the j-1 operation result, and the j-1 decoding state is used to determine the j-1 word; use the query node
  • the third unit performs a nonlinear operation on the j-1th decoding state and the j-1th vocabulary, and outputs the obtained intermediate operation result to at least one second unit in the query node for processing to obtain the jth query state.
  • the third unit is GRU and the second unit is T-GRU
  • black boxes represent GRUs
  • white boxes represent T-GRUs
  • dashed boxes 901 represent processing nodes
  • dashed boxes 902 Indicates the query node.
  • the GRU in the processing node can perform a non-linear operation on the j-1th decoding state and the j-1th vocabulary according to the calculation formula introduced above, and output the obtained intermediate operation result to the first T in the processing node -GRU, the first T-GRU in the processing node processes the received data and outputs it to the second T-GRU in the processing node, and so on, until the last T-GRU in the processing node
  • the GRU processes the received data according to the calculation formula to obtain the jth query state.
  • Step 802 when the decoding model further includes an attention operation node, the attention operation node is used to perform attention operation on the statement vector and the jth query state to obtain a jth source language attention context, and the jth source language attention context It is the coded part of the source sentence at the jth time.
  • the attention computing node is respectively connected to the coding model, the query node and the processing node.
  • FIG. 9-11 The dotted box 903 in FIG. 9-11 represents the attention computing node.
  • the attention operation node in this application may be a multi-attention operation model, or other attention models, such as a traditional attention calculation model, local and global attention models, etc., in this embodiment Not limited.
  • the jth source language focuses on context C is the statement vector and v is the query status.
  • step 803 the first unit is used to perform an element product operation on the jth query state and the first difference to obtain a first product, and the first difference is equal to a predetermined value minus the update gating of the first unit.
  • the first difference is equal to the predetermined value minus the update gating of the first unit.
  • the predetermined value may be 1 or other values, which is not limited in this embodiment.
  • the update gating is used to measure the proportion of the jth language attention context vector from the jth language attention context and the jth query state.
  • the calculation formula for updating the gating is detailed in the description in the L-GRU and will not be repeated here.
  • Step 804 Use the first unit to linearly transform the jth language attention context through a linear transformation function, perform element product operation on the obtained linear transformation function value and linear transformation gating, and obtain a second product;
  • the jth language focuses on the context and the jth query state to perform a non-linear transformation, and adds the obtained hyperbolic tangent function value to the second product to obtain the candidate activation function value.
  • the linear transformation gating is used to control the candidate activation function value to include the linear transformation function value.
  • the calculation formula of linear transformation gating please refer to the description in L-GRU, which will not be repeated here.
  • step 805 the first unit is used to perform an element product operation on the update gating and the candidate activation function value to obtain a third product.
  • step 806 the first unit is used to add the first product to the third product to obtain the jth operation result.
  • processing of data in steps 803-807 is the processing of data by the L-GRU in the processing node according to the calculation formula, which is simply expressed as I will not repeat them here.
  • c is the source language attention context
  • v is the query status.
  • Step 807 Output the obtained j-th operation result to at least one second unit in the processing node for processing to obtain the j-th vocabulary.
  • the output of T-GRU is 2 ⁇ p ⁇ l d +1.
  • the first point to note is that after the decoding model generates the jth decoding state, it also obtains the j-1th vocabulary, the jth decoding state, and the jth source language focus context, and calculates the output based on the above three data Vector o j , the calculation formula is with Is the weight of the decoding model, obtained through training; The decoding model then obtains the output vector o j through softmax, according to the calculation formula Calculate the probability of each vocabulary in the jth vocabulary, and use the vocabulary corresponding to the maximum probability as the jth vocabulary. Among them, the vocabulary is preset in the decoding model.
  • the query node obtains the initial query state and vocabulary, and processes the initial query state and vocabulary according to the calculation formula to obtain the first query state v 1 ;
  • the attention operation node obtains the statement vector and v 1 according to The calculation formula processes the statement vector and v 1 to obtain the first source language attention context c 1 ;
  • the L-GRU in the processing node obtains v 1 and c 1 , and performs linear operation on the v 1 and c 1 according to the calculation formula
  • the non-linear operation is output to the first T-GRU in the processing node;
  • the first T-GRU in the processing node processes the received data according to the calculation formula and outputs the second in the processing node T-GRU, and so on, until the last T-GRU in the processing node processes the received data according to the calculation formula to obtain the first decoding state s 1 ;
  • sofrmax obtains c 1 and s 1 according to the calculation formula
  • the first vocabulary y 1 is obtained .
  • the query node obtains v 1 and y 1, according to the formula to obtain a second query a state after processing the v 1 and y 1 v 2; the same process flow of a processing flow subsequent to the first time, and Finally get the second vocabulary y 2 .
  • Step 808 after obtaining k vocabularies, generate a target sentence according to the k vocabularies.
  • the k vocabularies are sorted according to the generation order of each vocabulary to obtain the target sentence. For example, the first vocabulary obtained by the decoding model is "The”, the second vocabulary is “housing”, the third vocabulary is “prices”, the fourth vocabulary is “continued”, and the fifth vocabulary is "to” , The sixth vocabulary is "rise”, then the target sentence is "The housing prices continued” to "rise.”
  • the natural language corresponding to the source sentence and the natural language corresponding to the target sentence are different.
  • the natural language corresponding to the source sentence is Chinese
  • the natural language corresponding to the target sentence is English
  • the natural language corresponding to the source sentence is French
  • the natural language corresponding to the target sentence is English
  • the natural language corresponding to the source sentence is English
  • the target sentence corresponds to The natural language is Chinese.
  • the natural language corresponding to the source sentence and the natural language corresponding to the target sentence may be the same or different, and this embodiment is not limited.
  • the depth of the query node is deepened in this application, and the depth of the query node can be deepened in at least one of the following ways: a T-GRU is added to the query node, thereby improving the learning of the decoding model Ability; or modify the GRU to L-GRU, thereby improving the accuracy of the weights of the decoding model to improve the accuracy of sentence processing.
  • the error of back propagation includes The error of the linear operation part and the non-linear operation part, because the gradient of the error of the linear operation part is constant, it can slow down the decline speed of the gradient of the entire error, which is improved because the gradient of the entire error decreases exponentially until it disappears, resulting in The problem of inaccurate weights of the coding model improves the accuracy of sentence processing.
  • the above-mentioned coding model and decoding model can also be combined in this application to obtain a machine learning model that includes coding and decoding capabilities, that is, any of the coding models of FIGS. 2, 4 and 5 and FIG. 9- 11 any combination of decoding models.
  • a machine learning model that includes coding and decoding capabilities
  • the coding model is a bidirectional coding model
  • the querier in the decoding model includes a first unit and at least one second unit as an example for illustration.
  • the coding model in the machine learning model shown in Figure 12 first segment the source sentence to get "House Price ", "continuation” and "growth”; according to the coding direction from the front to the back, use the first three processing nodes to process these three vocabularies, respectively, and get the vocabulary vector 1 corresponding to "house price” in turn.
  • Vocabulary vector 4 vocabulary vector 5 corresponding to "persistent", vocabulary vector 6 corresponding to "house price”, the resulting sentence vector is [vocabulary vector 1 vocabulary vector 6, vocabulary vector 2 vocabulary vector 5, vocabulary vector 3 Vector 4], output the sentence vector to the decoding model.
  • the decoding model uses the above decoding method to decode the sentence vector.
  • the word “The” is obtained in the first decoding
  • the word “housing” is obtained in the second decoding
  • the word “prices” is obtained in the third decoding.
  • the word “continued” is obtained in the fourth decoding
  • the word “to” is obtained in the fifth decoding
  • the word “rise” is obtained in the sixth decoding, then the target sentence is "The housing prices continued” to "rise”.
  • the number of T-GRUs in the processing node of the encoding model, the number of T-GRUs in the processing node of the decoding model, and the number of T-GRUs in the query node of the decoding model can be equal , Can also vary.
  • the relevant evaluation data is shown in Table 1 below.
  • the BLEU indicator is used to evaluate the effect of machine translation, and the higher the BLEU indicator, the better the effect of machine translation.
  • the brackets of the BLEU indicator in Table 1 mark the added value of this application relative to the standard machine learning model. Generally, if the added value exceeds 1, it can be considered that the effect of machine translation has been significantly improved. Therefore, this application can significantly improve the effect of machine translation.
  • the BLEU indicator of machine translation will be evaluated, and the relevant evaluation data As shown in Table 2 below.
  • Machine learning model Chinese English GRU+1T-GRU 43.63 L-GRU+1T-GRU 44.41 GRU+4T-GRU 44.16 L-GRU+4T-GRU 45.04
  • the following three nodes are one of L-GRU+4T-GRU and GRU to evaluate the BLEU index of machine translation.
  • the relevant evaluation data is shown in Table 3 below.
  • means its corresponding node is L-GRU+4T-GRU
  • means its corresponding node is GRU.
  • the machine translation model of the machine learning model in which all three nodes are L-GRU+4T-GRU has the best effect.
  • steps in the embodiments of the present application are not necessarily executed in the order indicated by the step numbers. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The order is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • FIG. 13 shows a structural block diagram of a sentence processing apparatus provided in some embodiments of the present application, which is used in an encoding model.
  • the encoding model includes n cascading processing nodes, and the processing node includes a cascading One unit and at least one second unit, n ⁇ 2.
  • the sentence processing device includes:
  • the word segmentation module 1310 is used to perform word segmentation on the source sentence to be encoded to obtain m vocabularies, m ⁇ n;
  • the obtaining module 1320 is used to obtain the i-th vocabulary of the m vocabularies using the i-th processing node of the n processing nodes, and obtain the i-1th vocabulary vector obtained by the i-1th processing node, the i-th -1 vocabulary vector is the coding vector of the i-1th vocabulary in m vocabularies, i ⁇ m;
  • the operation module 1330 is configured to perform linear operation and nonlinear operation on the i-th vocabulary and the i-1th vocabulary vector using the first unit in the i-th processing node, and output the obtained i-th operation result to at least one
  • the second unit performs processing to obtain the i-th vocabulary vector
  • the generating module 1340 is configured to generate sentence vectors according to the m vocabulary vectors after obtaining m vocabulary vectors.
  • the sentence vectors are used to determine the target sentence or target classification.
  • the coding model is a unidirectional coding model and the coding direction is from front to back
  • the i-th processing node is the processing node arranged in the i-th position among the n processing nodes in order from the front to the back;
  • the i-th vocabulary is a vocabulary arranged in the i-th position among the m vocabularies in order from the front to the back.
  • the encoding model is a unidirectional encoding model and the encoding direction is from back to front
  • the i-th processing node is the processing node arranged in the i-th position among the n processing nodes in order from back to front;
  • the i-th vocabulary is a vocabulary arranged in the i-th position among the m vocabularies in order from back to front.
  • the coding model is a bidirectional coding model and the coding direction is from front to back and back to front, m ⁇ n/2;
  • the i-th processing node includes processing nodes arranged in the i-th position among the n processing nodes in the order from front to back, and processing arranged in the i-th position among the n processing nodes in the order from the rear node;
  • the i-th vocabulary includes the vocabulary arranged in the i-th position among the m vocabularies in the order from the front to the back, and the vocabulary arranged in the i-th position among the m vocabularies in the order from the rear.
  • calculation module 1330 is also used to:
  • the first unit uses the first unit to perform the element product operation on the i-1th vocabulary vector and the first difference to obtain the first product.
  • the first difference is equal to the predetermined value minus the update gating of the first unit.
  • the update gating is used to measure
  • the i-th vocabulary vector comes from the ratio of the i-th vocabulary to the i-1 vocabulary vector;
  • the first unit uses the first unit to perform linear transformation on the i-th vocabulary through a linear transformation function, and perform element product operation on the obtained linear transformation function value and linear transformation gating to obtain a second product;
  • the i-1th vocabulary vector is nonlinearly transformed, and the resulting hyperbolic tangent function value is added to the second product to obtain the candidate activation function value.
  • the linear transformation gating is used to control the candidate activation function value to include the linear transformation function value;
  • the first unit is used to add the first product and the third product to obtain the ith operation result.
  • the sentence processing device provided by the embodiment of the present application can perform linear operation and non-linear operation on the i-th vocabulary and the i-1th vocabulary vector by the first unit in the processing node
  • the first unit will also perform linear and nonlinear operations on the training data and output it.
  • FIG. 14 shows a structural block diagram of a sentence decoding apparatus provided in still another embodiment of the present application, which is used in a decoding model.
  • the decoding model includes a processing node, and the processing node includes a cascaded first unit and At least one second unit.
  • the sentence decoding device includes:
  • the obtaining module 1410 is used to obtain the statement vector and the jth query state at the jth time, the statement vector is obtained after the source model to be encoded is encoded by the coding model, and the jth query state is used to query the jth time The coded part of the source sentence;
  • the generating module 1420 is configured to generate a jth source language attention context based on the statement vector and the jth query state, and the jth source language attention context is used to indicate the coded part of the source sentence at the jth time;
  • the operation module 1430 is used to perform linear operation and nonlinear operation on the jth query state and the jth language attention context using the first unit in the processing node, and output the obtained jth operation result to at least the processing node A second unit is processed to get the jth vocabulary;
  • the generating module 1420 is also used to generate a target sentence according to k vocabularies after obtaining k vocabularies, j ⁇ k.
  • the decoding model further includes a query node connected to the processing node.
  • the query node includes a first unit; the acquisition module 1410 is also used to:
  • the j-1 decoding state is obtained by the processing node according to the j-1 operation result, and the j-1 A decoding state is used to determine the j-1 vocabulary;
  • the first unit in the query node is used to perform linear and non-linear operations on the j-1th decoding state and the j-1th vocabulary to obtain the jth query state.
  • the decoding model further includes a query node connected to the processing node.
  • the query node includes a first unit and at least one second unit; the obtaining module 1410 is also used to:
  • the j-1 decoding state is obtained by the processing node according to the j-1 operation result, and the j-1 A decoding state is used to determine the j-1 vocabulary;
  • the decoding model further includes a query node connected to the processing node.
  • the query node includes a third unit and at least one second unit; the obtaining module 1410 is also used to:
  • the j-1 decoding state is obtained by the processing node according to the j-1 operation result, and the j-1 A decoding state is used to determine the j-1 vocabulary;
  • the third unit in the query node performs a non-linear operation on the j-1 decoding state and the j-1 vocabulary, and outputs the obtained intermediate operation result to at least one second unit in the query node for processing to obtain the first j query status.
  • the decoding model further includes an attention operation node, and the attention operation node is respectively connected to the coding model, the query node, and the processing node; the generation module 1420 is also used to:
  • Attention computing nodes are used to perform attention operations on the statement vector and the jth query state to obtain the jth source language attention context.
  • calculation module 1430 is also used to:
  • the first unit uses the first unit to perform the element product operation on the jth query state and the first difference to obtain the first product.
  • the first difference is equal to the predetermined value minus the update gating of the first unit.
  • the update gating is used to measure the j
  • the language-focused context vector comes from the ratio of the j-th language-focused context to the j-th query state;
  • the language focuses on the context and the jth query state to perform a non-linear transformation, add the obtained hyperbolic tangent function value and the second product to obtain the candidate activation function value, and the linear transformation gating is used to control the candidate activation function value including the linear transformation function value;
  • the first unit is used to add the first product and the third product to obtain the jth operation result.
  • the sentence decoding device can perform linear and non-linear operations on the jth query state and the jth source language attention context in the first unit in the processing node.
  • the first unit will also perform linear operation and nonlinear operation on the training data and output it.
  • the error Including the error in the linear operation part and the non-linear operation part. Since the gradient of the error in the linear operation part is constant, it can slow down the gradient of the entire error gradient, which is improved because the gradient of the entire error decreases exponentially until it disappears. This leads to the problem of inaccurate weights of the decoding model, thereby improving the accuracy of sentence processing.
  • the present application also provides a server including a processor and a memory, where at least one instruction is stored in the memory, and at least one instruction is loaded and executed by the processor to implement the sentence processing method or sentence decoding method provided by the foregoing method embodiments .
  • the server may be the server provided in FIG. 15 below.
  • FIG. 15 shows a schematic structural diagram of a server provided by an exemplary embodiment of the present application.
  • the server 1500 includes a central processing unit (CPU) 1501, a system memory 1504 including a random access memory (RAM) 1502 and a read-only memory (ROM) 1503, and a system memory 1504 connected to the central processing unit 1501 System bus 1505.
  • the server 1500 also includes a basic input/output system (I/O system) 1506 to help transfer information between various devices in the computer, and a mass storage for storing the operating system 1513, application programs 1514, and other program modules 1515 Device 1507.
  • I/O system basic input/output system
  • the basic input/output system 1506 includes a display 1508 for displaying information and an input device 1509 for a user to input information, such as a mouse and a keyboard.
  • the display 1508 and the input device 1509 are both connected to the central processing unit 1501 through an input and output controller 1510 connected to the system bus 1505.
  • the basic input/output system 1506 may further include an input-output controller 1510 for receiving and processing input from a number of other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input output controller 1510 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1507 is connected to the central processing unit 1501 through a mass storage controller (not shown) connected to the system bus 1505.
  • the mass storage device 1507 and its associated computer-readable storage medium provide non-volatile storage for the server 1500. That is, the mass storage device 1507 may include a computer-readable storage medium (not shown) such as a hard disk or CD-ROI drive.
  • the computer-readable storage medium may include a computer storage medium and a communication medium.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory, or other solid-state storage technologies, CD-ROM, DVD, or other optical storage, tape cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other solid-state storage technologies
  • CD-ROM, DVD or other optical storage
  • tape cassettes magnetic tape
  • magnetic disk storage or other magnetic storage devices.
  • the above-mentioned system memory 1504 and mass storage device 1507 may be collectively referred to as a memory.
  • the memory stores one or more programs that are configured to be executed by one or more central processing units 1501.
  • the one or more programs contain instructions for implementing the above sentence encoding or sentence decoding method.
  • the central processing unit 1501 Execute the one or more programs to implement the sentence processing method or sentence decoding method provided by the foregoing method embodiments.
  • the server 1500 may also be operated by a remote computer connected to the network through a network such as the Internet. That is, the server 1500 may be connected to the network 1512 through the network interface unit 1511 connected to the system bus 1505, or the network interface unit 1511 may also be used to connect to other types of networks or remote computer systems (not shown) .
  • the memory further includes one or more programs, and the one or more programs are stored in the memory.
  • the one or more programs include a sentence processing method or sentence decoding method provided by the embodiment of the present invention. The steps performed by the server.
  • Embodiments of the present application also provide a computer-readable storage medium, where at least one instruction, at least one program, code set, or instruction set is stored in the storage medium, and the at least one instruction, the at least one program, and the code set Or the instruction set is loaded and executed by the processor 1510 to implement the sentence processing method or sentence decoding method as described above.
  • the present application also provides a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the sentence processing method or sentence decoding method provided by the foregoing method embodiments.
  • Some embodiments of the present application provide a computer-readable storage medium that stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the The code set or instruction set is loaded and executed by the processor to implement the sentence processing method or sentence decoding method as described above.
  • the sentence encoding and decoding device only uses the division of the above functional modules as an example to illustrate sentence coding or sentence decoding.
  • the above functions can be assigned by different The function module is completed, that is, the internal structure of the sentence codec device is divided into different function modules to complete all or part of the functions described above.
  • the sentence encoding and decoding device and sentence encoding and decoding method embodiments provided in the above embodiments belong to the same concept. For the specific implementation process, see the method embodiments, and details are not described here.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种语句处理方法、语句解码方法、装置、存储介质及设备。语句处理方法用于编码模型中,编码模型包括级联的n个处理节点;对待编码的源语句进行分词运算,得到m个词汇(301);利用n个处理节点中的第i个处理节点获取m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,该第i-1个词汇向量是m个词汇中的第i-1个词汇的编码向量(302);利用第i个处理节点中的第一单元对第i个词汇和第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给至少一个第二单元进行处理,得到第i个词汇向量(303);在得到m个词汇向量后,根据m个词汇向量生成语句向量,该语句向量用于确定目标语句或目标分类(304)。

Description

语句处理方法、语句解码方法、装置、存储介质及设备
本申请要求于2018年11月29日提交中国专利局,申请号为201811444710.8,申请名称为“语句编码方法、语句解码方法、装置、存储介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及语句处理领域,特别涉及一种语句处理方法、语句解码方法、装置、存储介质及设备。
背景技术
计算机可以对输入的一个语句进行处理后输出另一个语句。以机器翻译为例,机器翻译是指通过计算机将一种自然语言的语句翻译成另一种自然语言的语句的翻译方式。通常,机器翻译是通过训练好的机器学习模型对语句进行翻译的。比如,用户将中文语句“房价持续增长”输入机器学习模型后,该机器学习模型输出英文语句“The housing prices continued to rise”。
相关技术中,机器学习模型包括编码模型和解码模型,该编码模型用于将输入的一种自然语言的源语句编码成一个语句向量,并将该语句向量输出给解码模型;该解码模型用于将该语句向量解码成另一种自然语言的目标语句。示意性的,编码模型和解码模型都由神经网络模型构成。目前,语句处理模型进行语句处理的准确度较低。
发明内容
根据本申请提供的各种实施例,提供一种语句处理方法、语句解码方法、装置、存储介质及设备。
一方面,提供了一种语句处理方法,由语句处理设备执行,用于编码模型中,所述编码模型包括级联的n个处理节点,所述处理节点包括级联的一个第一单元和至少一个第二单元,n≥2;包括:
对待编码的源语句进行分词运算,得到m个词汇,m≤n;
利用所述n个处理节点中的第i个处理节点获取所述m个词汇中的第i个词 汇,并获取第i-1个处理节点得到的第i-1个词汇向量,所述第i-1个词汇向量是所述m个词汇中的第i-1个词汇的编码向量,i≤m;
利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给所述至少一个第二单元进行处理,得到第i个词汇向量;及
在得到m个词汇向量后,根据所述m个词汇向量生成语句向量,所述语句向量用于确定目标语句或目标分类。
一方面,提供了一种语句解码方法,由语句处理设备执行,用于解码模型中,所述解码模型包括一个处理节点,所述处理节点包括级联的一个第一单元和至少一个第二单元;所述方法包括:
在第j个时刻,获取语句向量和第j个查询状态,所述语句向量是编码模型对待编码的源语句进行编码后得到的,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
一方面,提供了一种语句处理装置,用于编码模型中,所述编码模型包括级联的n个处理节点,所述处理节点包括级联的一个第一单元和至少一个第二单元,n≥2;包括:
分词模块,用于对待编码的源语句进行分词运算,得到m个词汇,m≤n;
获取模块,用于利用所述n个处理节点中的第i个处理节点获取所述m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,所述第i-1个词汇向量是所述m个词汇中的第i-1个词汇的编码向量,i≤m;
运算模块,用于利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给所述至少一个第二单元进行处理,得到第i个词汇向量;及
生成模块,用于在得到m个词汇向量后,根据所述m个词汇向量生成语句向量,所述语句向量用于确定目标语句或目标分类。
一方面,提供了一种语句解码装置,用于解码模型中,所述解码模型包括 一个处理节点,所述处理节点包括级联的一个第一单元和至少一个第二单元;包括:
获取模块,用于在第j个时刻,获取语句向量和第j个查询状态,所述语句向量是编码模型对待编码的源语句进行编码后得到的,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
生成模块,用于根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
运算模块,用于利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
所述生成模块,还用于在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
一方面,提供了一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如上所述的语句处理方法,或者执行如上所述的语句解码方法中的至少一种方法。
一方面,提供了一种语句处理设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如上所述的语句处理方法,或者执行如上所述的语句解码方法中的至少一种方法。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据部分示例性实施例示出的一种语句处理系统的结构示意图;
图2是本申请一些实施例提供的一种编码模型的示意图;
图3是本申请一些实施例提供的语句处理方法的方法流程图;
图4是本申请一些实施例提供的一种编码模型的示意图;
图5是本申请一些实施例提供的一种编码模型的示意图;
图6是本申请一些实施例提供的语句处理方法的方法流程图;
图7是本申请一些实施例提供的语句解码方法的方法流程图;
图8是本申请一些实施例提供的语句解码方法的方法流程图;
图9是本申请一些实施例提供的一种解码模型的示意图;
图10是本申请一些实施例提供的一种解码模型的示意图;
图11是本申请一些实施例提供的一种解码模型的示意图;
图12是本申请一些实施例提供的一种解码模型和解码模型的示意图;
图13是本申请一些实施例提供的语句处理装置的结构框图;
图14是本申请一些实施例提供的语句解码装置的结构框图;及
图15是本申请再一实施例提供的服务器的结构框图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
语音技术(Speech Technology)的关键技术有自动语音识别技术(ASR)和语音合成技术(TTS)以及声纹识别技术。让计算机能听、能看、能说、能感觉,是未来人机交互的发展方向,其中语音成为未来最被看好的人机交互方式之一。
自然语言处理(Nature Language processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有 效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的自然语言处理等技术,具体通过如下实施例进行说明:
首先,下面对本申请涉及的应用场景进行介绍:
本申请主要涉及两类应用场景,第一类应用场景中机器学习模型根据语句生成语句,第二类应用场景中机器学习模型对语句进行分类,下面分别对这两类应用场景进行说明。
第一类应用场景可以包括多个应用场景,下面以机器翻译、人机对话、文本自动生成这三个应用场景进行举例说明。
1)机器翻译
机器翻译:是指通过计算机将一种自然语言的语句翻译成另一种自然语言的语句的翻译方式。通常,该机器翻译通过训练好的机器学习模型对语句进行翻译。示意性的,通过大量的翻译语料样本对机器学习模型进行训练,该翻译语料样本中包括多组中文语料和英文语料的对应关系,每个中文语料对应一个英文语料作为翻译结果,训练完成后,用户将中文语句“房价持续增长”输入该机器学习模型后,输出英文翻译“The housing prices continued to rise”。
下面根据机器学习模型的调用方式,对机器翻译的应用场景进行举例说明。
第一种:机器学习模型的入口对用户可见。例如,机器学习模型的入口为一个输入框。
在一种可能的应用场景中,机器学习模型设置在一个应用程序中,且该应用程序中提供有输入框。当用户需要对语句A进行翻译时,用户可以在该应用程序中找到该输入框,在该输入框中输入语句A,机器学习模型将该语句A作为待翻译的源语句。其中,语句A可以是用户手动输入或从其他文本中复制得到的,本实施例不作限定。
在一种可能的应用场景中,将机器学习模型设置在服务器中,且该服务器对应的网页中提供有输入框。当用户需要对语句A进行翻译时,用户可以启动浏览器以打开该网页,在该网页中找到该输入框,在该输入框中输入语句A,浏览器将该语句A发送给服务器,服务器中的机器学习模型将该语句A作为待翻译的源语句。其中,语句A可以是用户手动输入或从其他文本中复制得到的,本实施例不作限定。
第二种:机器学习模型的入口对用户不可见。例如,机器学习模型中内嵌于某一个应用程序中,或者,机器学习模型能够被某一应用程序调用。
在一种可能的应用场景中,当用户在使用该应用程序浏览文本时,选中该文本中的语句A,此时该应用程序显示对该语句A的操作选项,若该操作选项中包括翻译选项,则当用户触发该操作选项时,该应用程序调用机器学习模型,将该语句A发送给该机器学习模型进行翻译。
2)人机对话
人机对话:是指通过计算机对用户输入的语句进行应答的对话方式。通常,该人机对话通过训练好的机器学习模型对语句进行应答。示意性的,通过大量的对话样本对机器学习模型进行训练,该对话样本中包括多组同种自然语言或不同种自然语句的对话。训练完成后,用户将语句“距离春节还有多少天”输入该机器学习模型后,输出应答语句“距离春节还有60天”。
3)文本自动生成
文本自动生成:是指通过计算机根据一个语句撰写一个或一段语句的文本生成方式。其中,当输入的语句的字数多于输出的语句的字数时,可以理解成是对输入的语句进行了内容提取,可以应用于摘要提取等应用场景中;当输入语句的字数少于输出的语句的字数时,可以理解成是对输入的语句进行了内容扩展,可以应用于语句的复写、文章生成等应用场景中。
通常,该文本自动生成通过训练好的机器学习模型生成文本。示意性的,用户将语句“本周末鲜花包邮”输入该机器学习模型后,输出一段关于鲜花促销的文本。
第二类应用场景可以包括多个应用场景,下面以情感分析、词性分析和实体分析这三个应用场景进行举例说明。
1)情感分析
情感分析:是指通过计算机根据语句对用户的情感进行分析的分类方式,这里所说的情感可以包括诸如悲伤、快乐之类的情绪、诸如忧郁、倦怠之类的心情、诸如冷漠、疏远之类的人际立场、诸如喜欢、讨厌之类的态度等等,本实施例不作限定。
通常,该情感分析通过训练好的机器学习模型对语句进行分析。示意性的,通过大量的情感分析样本对机器学习模型进行训练,该情感分析样本包括多组语句和情感的对应关系,每个语句对应一种情感。训练完成后,用户将语句“我很快乐”输入该机器学习模型后,输出“快乐”的分类结果。
2)词性分析
词性分析:是指通过计算机对语句中词汇的词性进行分析的分类方式,这里所说的词性可以包括动词、名词、形容词、介词、副词等等,本实施例不作限定。
通常,该词性分析通过训练好的机器学习模型对语句进行分析。示意性的,通过大量的词性分析样本对机器学习模型进行训练,该词性分析样本包括多组语句和词性的对应关系,每个语句中的一个词汇对应一种词性。训练完成后,用户将语句“我很快乐”输入该机器学习模型后,输出“我”属于名词分类、“很”属于副词分类、“快乐”属于形容词分类的分类结果。
3)命名实体分析
实体分析:是指通过计算机提取语句中的命名实体的分类方式,这里所说的命名实体可以包括人名、地名、组织等等。
通常,该命名实体分析通过训练好的机器学习模型对语句进行分析。示意性的,通过大量的命名实体分析样本对机器学习模型进行训练,该命名实体分析样本包括多组语句和命名实体的对应关系,每个语句中的一个命名实体对应一种命名实体。训练完成后,用户将语句“我在公司”输入该机器学习模型后,输出“公司”的分类结果。
值得注意的是,上述应用场景仅为示意性的举例,在实际操作中,通过机器学习模型实现语句的编码和解码的应用场景都可以使用本申请实施例中提供的方法,本申请实施例对此不作限定。
其次,对本申请中涉及的名词进行简单介绍:
在一些实施例中,上述机器学习模型可以实现为神经网络模型、支持向量机(Support Vector Machine,SVM)、决策树(Decision Tree,DT)等模型,本申请实施例对此不加以限定,本申请实施例中以该机器学习模型为RNN(Recurrent Neural Network,循环神经网络)模型为例进行说明。
编码模型:是指将一种自然语言的语句编码成一个语句向量的模型。其中,语句向量由语句中每个词汇对应的一个词汇向量组成,该词汇向量表示一个词汇在该语句中的向量,下文会对词汇向量的生成方式进行介绍,此处不作赘述。
示意性的,中文语句“房价持续增长”中包括“房价”、“持续”和“增长”这三个词汇,且“房价”对应于词汇向量1、“持续”对应于词汇向量2、“增长”对应于词汇向量3,则得到的语句向量=[词汇向量1,词汇向量2,词汇向量3]。
需要说明的是,在对语句编码成语句向量之前,需要对该语句进行分词运算,以得到至少两个词汇,本实施例不对分词运算作限定。这里所说的词汇是对语句进行分词得到的,可以是字、词、子词等,本实施例不作限定。其中,子词是在词的基础上进行分词得到的。例如,将词“北京大学”分为“北京”和“大学”这两个子词。
解码模型:是指将一个语句向量解码成一种自然语言的语句的模型。其中,解码模型每对语句向量进行一次解码得到一个词汇,将得到的所有词汇组成一个语句。
示意性的,语句向量=[词汇向量1,词汇向量2,词汇向量3],则解码模型对该语句向量进行第1次解码,得到词汇“The”,对该语句向量进行第2次解码,得到词汇“housing”,依此类推,直至对该语句向量进行第6次解码,得到词汇“rise”,将得到的6个词汇组成语句“The housing prices continued to rise”。
值得注意的是,本申请实施例可以实现在终端中,也可以实现在服务器中,还可以由终端和服务器共同实现,如图1所示,终端11用于生成源语句,并将该源语句发送至服务器12,服务器12对该源语句进行处理后,将处理结果发送至终端11进行展示。可选地,终端11与服务器12之间通过通信网络进行连接,该通信网络可以是有线网络也可以是无线网络,本申请实施例对此不加以限定。
示意性的,服务器12中存储有用于机器翻译的机器学习模型,用户在终端11中输入需要翻译的源语句“房价持续上涨”后,终端11将该源语句发送至服务器12,由服务器12对该源语句通过机器学习模型进行翻译后得到目标语句,并将该目标语句发送至终端11进行展示。
本申请中的机器学习模型包括编码模型和解码模型,下面先对编码模型的 结构进行介绍。请参考图2,该编码模型包括级联的n个处理节点201,该处理节点201包括级联的一个第一单元和至少一个第二单元,第一单元可以与第一个第二单元级联,处理节点201也可以包括多个第一单元,可以是最后一个第一单元与第一个第二单元级联。且图2中以阴影框表示第一单元,白框表示第二单元,则每个处理节点201中顺序包括:第一单元、第二单元、…、第二单元。其中,n≥2,
在一种可能的实现方式中,第一单元是具有非线性运算能力和线性运算能力的GRU,如L-GRU或对GRU做出其他线性变换改进的GRU;第二单元是T-GRU。下面分别对GRU、L-GRU和T-GRU进行介绍。
1)GRU(Gate Recurrent Unit,门控循环单元):
在第i个时刻时GRU的输出的计算公式为:
Figure PCTCN2019121420-appb-000001
其中,z i是GRU的更新门控,计算公式为z i=σ(W xzx i+W hzh i-1),x i是第i个时刻GRU的输入,h i-1是第i-1个时刻时GRU的输出,σ是激活函数;⊙是元素积运算的符号;
Figure PCTCN2019121420-appb-000002
是候选激活函数,计算公式为
Figure PCTCN2019121420-appb-000003
tanh是双曲正切函数;r i是GRU的重置门控,计算公式为r i=σ(W xrx i+W hrh i-1);W xz、W hz、W xh、W hh、W xr和W hr是GRU的权值,由训练得到。
更新门控z i用于衡量h i来自于x i和h i-1的比例。更新门控z i的数值越大表示来自h i-1的比例越大;更新门控z i的数值越小表示来自h i-1的比例越小。
重置门控r i用于衡量
Figure PCTCN2019121420-appb-000004
来自于x i和h i-1的比例。重置门控r i的数值越大表示来自h i-1的比例越小;更新门控z i的数值越小表示来自h i-1的比例越大。
2)T-GRU(Transition GRU,转换门控循环单元):
T-GRU不会出现在机器学习模型中的第一层,所以,T-GRU中不存在输入x i
在第i个时刻时T-GRU的输出的计算公式为:
Figure PCTCN2019121420-appb-000005
其中,z i是T-GRU的更新门控,计算公式为z i=σ(W hzh i-1),h i-1是第i-1个时刻时T-GRU的输出,σ是激活函数;⊙是元素积运算的符号;
Figure PCTCN2019121420-appb-000006
是候选激活函数,计算公式为
Figure PCTCN2019121420-appb-000007
tanh是双曲正切函数;r i是T-GRU的重置门 控,计算公式为r i=σ(W hrh i-1);W hz、W hh和W hr是T-GRU的权值,由训练得到。
更新门控z i用于衡量h i来自于h i-1的比例。更新门控z i的数值越大表示来自h i-1的比例越大;更新门控z i的数值越小表示来自h i-1的比例越小。
重置门控r i用于衡量
Figure PCTCN2019121420-appb-000008
来自于h i-1的比例。重置门控r i的数值越大表示来自h i-1的比例越小;更新门控z i的数值越小表示来自h i-1的比例越大。
3)L-GRU(Linear Transformation enhanced GRU,线性变换强化的门控循环单元):
在第i个时刻时L-GRU的输出的计算公式为:
Figure PCTCN2019121420-appb-000009
其中,z i是L-GRU的更新门控,计算公式为z i=σ(W xzx i+W hzh i-1),x i是第i个时刻L-GRU的输入,h i-1是第i-1个时刻时L-GRU的输出,σ是激活函数;⊙是元素积运算的符号;
Figure PCTCN2019121420-appb-000010
是候选激活函数,计算公式为
Figure PCTCN2019121420-appb-000011
tanh是双曲正切函数;r i是L-GRU的重置门控,计算公式为r i=σ(W xrx i+W hrh i-1);H是线性变换函数,计算公式为H(x i)=W xx i;l i是L-GRU的线性变换门控,计算公式为l i=σ(W xlx i+W hlh i-1);W xz、W hz、W xh、W hh、W xr、W hr、W x、W xl和W hl是L-GRU的权值,由训练得到。
更新门控z i用于衡量h i来自于x i和h i-1的比例。更新门控z i的数值越大表示来自h i-1的比例越大;更新门控z i的数值越小表示来自h i-1的比例越小。
重置门控r i用于衡量
Figure PCTCN2019121420-appb-000012
来自于x i和h i-1的比例。重置门控r i的数值越大表示来自h i-1的比例越小;更新门控z i的数值越小表示来自h i-1的比例越大。
线性变换门控l i用于控制候选激活函数值包含线性变换函数值。换句话说,线性变换门控l i用于强化候选激活函数值,使候选激活函数值在一定程度上包含对x i的线性变换结果。
在了解了编码模型的结构后,下面对利用编码模型对语句进行编码的方法进行介绍。
请参考图3,其示出了本申请一些实施例提供的语句处理方法的方法流程图,该语句处理方法,包括:
步骤301,对待编码的源语句进行分词运算,得到m个词汇。
源语句是指对应于一种自然语言的语句。其中,源语句可以是用户输入的,也可以是用户从文本中选定的。
以本实施例的方法应用于机器翻译的应用场景为例,则可选的,源语句为用户输入的待翻译的语句,可选的,该源语句也可以是用户在浏览文本时选中生成的,如:用户浏览文章时选中文字内容“房价持续上涨”后,选择翻译选项后,该被选中的文字内容即为源语句。
在编码模型得到源语句后,可以对该源语句进行分词运算,本实施例不对分词运算的运算方式作限定。
本实施例中,源语句中的每个词汇对应于一个处理节点,所以,分词得到的词汇的数量m需要小于等于机器学习模型中处理节点的数量n,即m≤n。
在得到m个词汇后,编码模型可以执行步骤302和303,在得到一个词汇向量后,将i更新为i+1,继续执行步骤302和303以得到下一个词汇向量,依此类推,直至将i更新为m以得到第m个词汇向量后停止循环,执行步骤304。
步骤302,利用n个处理节点中的第i个处理节点获取m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,该第i-1个词汇向量是m个词汇中的第i-1个词汇的编码向量。
在编码模型得到m个词汇后,按照每个词汇在源语句中的位置,对该m个词汇进行排序。例如,源语句为“房价持续增长”,则排序后的3个词汇为“房价”、“持续”和“增长”。
其中,第i-1个词汇向量是根据m个词汇中前i-2个词汇生成的,但其表示的是第i-1个词汇的编码向量。例如,图2中的h 1是根据第1个词汇生成的,且其表示第1个词汇向量;h 2是根据第1和2个词汇生成的,且其表示第2个词汇向量。
步骤303,利用第i个处理节点中的第一单元对第i个词汇和第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给至少一个第二单元进行处理,得到第i个词汇向量。
以第一单元是L-GRU,第二单元是T-GRU为例,对编码模型在每个时刻得到一个词汇向量的流程进行介绍。其中,本实施例中将处理节点从接收数据到输出数据的时间称为一个时刻,也称为一个时间步。
在第1个时刻,第一个处理节点中的L-GRU接收源语句中的第一个词汇x 1,根据计算公式对x 1进行线性运算和非线性运算后输出给该第一个处理节点中的第一个T-GRU,该第一个T-GRU根据计算公式对接收到的数据进行处理后输出 给该第一个处理节点中的第二个T-GRU,依此类推,直至该第一个处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到h 1,该h 1即为x 1对应的词汇向量。
在第2个时刻,第二个处理节点中的L-GRU接收源语句中的第二个词汇x 2和第一个处理节点得到的h 1,根据计算公式对x 1和h 1进行线性运算和非线性运算后输出给该第二个处理节点中的第一个T-GRU,该第一个T-GRU根据计算公式对接收到的数据进行处理后输出给该第二个处理节点中的第二个T-GRU,依此类推,直至该第二个处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到h 2,该h 2即为x 2对应的词汇向量。
依此类推,第n个处理节点可以得到h m
需要说明的是,处理节点中T-GRU的数量可以是预先设置的。通常,T-GRU的数量与语句处理的准确性呈正相关关系,即T-GRU的数量越多,语句处理的准确性越高。然而,随着T-GRU的数量的增多,准确性的增幅会逐渐减小,而机器学习模型的复杂性会逐渐增大,导致语句处理的效率降低。所以,可以根据用户对语句处理的准确性和效率的需求,设置T-GRU的数量。
步骤304,在得到m个词汇向量后,根据m个词汇向量生成语句向量,该语句向量用于确定目标语句或目标分类。
在编码模型得到m个词汇向量后,按照每个词汇向量对应的词汇在源语句中的位置,对该m个词汇向量进行排序。例如,“房价”对应于词汇向量1、“持续”对应于词汇向量2、“增长”对应于词汇向量3,则得到的语句向量=[词汇向量1,词汇向量2,词汇向量3]。得到语句向量后,可以利用解码模型对语句向量进行解码,得到目标语句或者目标分类。
当本实施例的方法应用于第一类应用场景中时,语句向量用于供解码模型生成目标语句,该目标语句是指对应于一种自然语言的语句。其中,当本实施例的方法应用于机器翻译的应用场景中时,源语句对应的自然语言和目标语句对应的自然语言不同。比如,源语句对应的自然语言是中文,目标语句对应的自然语言是英文;源语句对应的自然语言是法文,目标语句对应的自然语言是英文;源语句对应的自然语言是英文,目标语句对应的自然语言是中文。当本实施例的方法应用于人机对话或文本自动生成的应用场景中时,源语句对应的自然语言和目标语句对应的自然语言可以相同,也可以不同。
当本实施例的方法应用于第二类应用场景中时,语句向量用于确定目标分类。其中,当本实施例的方法应用于情感分析的应用场景中时,目标分类是情 感分类。放本实施例的方法应用于词性分析的应用场景中时,目标分类是词性分类。当本实施例的方法应用于命名实体分析的应用场景中时,目标分类是命名实体分类。
采用本申请实施例提供的语句处理方法,由于处理节点中的第一单元可以对第i个词汇和第i-1个词汇向量进行线性运算和非线性运算,即可以根据上下文确定当前词汇的词汇向量,故能够提取得到更加准确的词汇向量。进一步地,由于机器学习模型处理语句时需要依赖于训练得到的权值,而训练需要涉及反向传播算法,即沿着训练数据的输出路径反向传输输出与参考结果之间的误差,以便于根据该误差来修改权值。然而,反向传播时机器学习模型中误差的梯度会呈指数级下降直至消失,使机器学习模型中前面的权值更新较慢,后面的权值更新较快,导致训练得到的权值不准确,也就导致语句处理的准确性较低,所以,在对编码模型进行训练以得到该编码模型的权值时,该第一单元也会对训练数据进行线性运算和非线性运算后输出,这样,在反向传播输出与参考结果的误差时,该误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致编码模型的权值不准确的问题,从而提高了语句处理的准确性。
在一些实施例中,还可以根据编码模型的编码方向设置编码模型的类型,下面对其中的三种编码模型进行介绍。
1、单向编码模型且编码方向为从前往后;
请参考图2,图2中以从左向右的方向来表示从前往后的编码方向,且阴影框表示L-GRU,白框表示T-GRU。此时,第i个处理节点是按照从前往后的顺序排列在n个处理节点中的第i个位置的处理节点201;第i个词汇是按照从前往后的顺序排列在m个词汇中的第i个位置的词汇。
比如,编码模型得到的m个词汇是“房价”、“持续”和“上涨”,则从左向右的方向上的第一个处理节点201处理的第1个词汇是“房价”,第二个处理节点201处理的第2个词汇是“持续”,第三个处理节点201处理的第3个词汇是“上涨”。
2、单向编码模型且编码方向为从后往前;
请参考图4,图4中以从右向左的方向来表示从后往前的编码方向,且阴影框表示L-GRU,白框表示T-GRU。此时,第i个处理节点是按照从后往前的顺序排列在n个处理节点中的第i个位置的处理节点401;第i个词汇是按照从后 往前的顺序排列在m个词汇中的第i个位置的词汇。
比如,编码模型得到的m个词汇是“房价”、“持续”和“上涨”,则从右向左的方向上的第一个处理节点401处理的第1个词汇是“上涨”,第二个处理节点401处理的第2个词汇是“持续”,第三个处理节点401处理的第3个词汇是“房价”。
3、双向编码模型且编码方向包括从前往后和从后往前;
请参考图5,图5中以从左向右的方向来表示从前往后的编码方向、以从右向左的方向来表示从后往前的编码方向,且阴影框表示L-GRU,白框表示T-GRU。此时,第i个处理节点包括按照从前往后的顺序排列在n个处理节点中的第i个位置的处理节点501,以及按照从后往前的顺序排列在n个处理节点中的第i个位置的处理节点502;第i个词汇包括按照从前往后的顺序排列在m个词汇中的第i个位置的词汇,以及按照从后往前的顺序排列在m个词汇中的第i个位置的词汇。
比如,编码模型得到的m个词汇是“房价”、“持续”和“上涨”,则从左向右的方向上的第一个处理节点501处理的第1个词汇是“房价”,第二个处理节点501处理的第2个词汇是“持续”,第三个处理节点501处理的第3个词汇是“上涨”;从右向左的方向上的第一个处理节点502处理的第1个词汇是“上涨”,第二个处理节点502处理的第2个词汇是“持续”,第三个处理节点502处理的第3个词汇是“房价”。
请参考图6,其示出了本申请另一实施例提供的语句处理方法的方法流程图。该语句处理方法,包括:
步骤601,对待编码的源语句进行分词运算,得到m个词汇。
在得到m个词汇后,编码模型可以执行步骤602-607,在得到一个词汇向量后,将i更新为i+1,继续执行步骤602-607以得到下一个词汇向量,依此类推,直至将i更新为m以得到第m个词汇向量后停止循环,执行步骤608。
步骤602,利用n个处理节点中的第i个处理节点获取m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,该第i-1个词汇向量是m个词汇中的第i-1个词汇的编码向量。
步骤603,利用第一单元将第i-1个词汇向量与第一差值进行元素积运算,得到第一乘积。
第一差值等于预定数值减去第一单元的更新门控,预定数值可以是1,也可以是其他数值,本实施例不作限定。
更新门控用于衡量第i个词汇向量来自于第i个词汇和第i-1个词汇向量的比例。更新门控的计算公式详见L-GRU中的说明,此处不作赘述。
在步骤604中,利用第一单元通过线性变换函数对第i个词汇进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对第i个词汇和第i-1个词汇向量进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值。
线性变换门控用于控制候选激活函数值包含线性变换函数值。线性变换门控的计算公式详见L-GRU中的说明,此处不作赘述。
在步骤605中,利用第一单元将更新门控与候选激活函数值进行元素积运算,得到第三乘积。
在步骤606中,利用第一单元将第一乘积与第三乘积相加,得到第i个运算结果。
需要说明的是,步骤603-606中对数据的处理过程即为一个处理节点中的L-GRU根据计算公式对数据的处理过程,详见上面L-GRU的计算公式,此处不作赘述。
假设处理节点的深度为l s,即处理节点中L-GRU和T-GRU的数量之和为l s,则第i个运算结果为
Figure PCTCN2019121420-appb-000013
在步骤607中,利用第i个处理节点中的第一单元将得到的第i个运算结果输出给至少一个第二单元进行处理,得到第i个词汇向量。
第i个处理节点中的L-GRU将得到的第i个运算结果输出给该第i个处理节点中的第一个T-GRU,该第i个处理节点中的第一个T-GRU对接收到的数据进行处理后输出给该第i个处理节点中的第二个T-GRU,依此类推,直至该第i个处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到第i个词汇向量,该第i个词汇向量即为第i个词汇对应的词汇向量。
假设处理节点的深度为l s,则第k个T-GRU的输出为
Figure PCTCN2019121420-appb-000014
1≤k≤l s
若编码模型为单向编码模型且编码方向为从前往后,则第i个词汇向量为
Figure PCTCN2019121420-appb-000015
若编码模型为单向编码模型且编码方向为从后往前,则第i个词汇向量为
Figure PCTCN2019121420-appb-000016
若编码模型为双向编码模型且编码方向包括从前往后和从后往前,则第i个词汇向量为
Figure PCTCN2019121420-appb-000017
其中
Figure PCTCN2019121420-appb-000018
步骤608,在得到m个词汇向量后,根据m个词汇向量生成语句向量,该语句向量用于生成目标语句,且该目标语句与该源语句对应于不同的自然语言。
综上所述,本申请实施例提供的语句处理方法,由于处理节点中的第一单元可以对第i个词汇和第i-1个词汇向量进行线性运算和非线性运算,所以,在对编码模型进行训练以得到该编码模型的权值时,该第一单元也会对训练数据进行线性运算和非线性运算后输出,这样,在反向传播输出与参考结果的误差时,该误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致编码模型的权值不准确的问题,从而提高了语句处理的准确性。
请参考图7,其示出了本申请一些实施例提供的语句解码方法的方法流程图,该解码模型包括一个处理节点,该处理节点包括级联的一个第一单元和至少一个第二单元。其中,第一单元是具有非线性运算能力和线性运算能力的GRU,如上文所述的L-GRU或对GRU做出其他线性变换改进的GRU;第二单元是T-GRU。该语句解码方法,包括:
步骤701,在第j个时刻,获取语句向量和第j个查询状态,该语句向量是编码模型对待编码的源语句进行编码后得到的,该第j个查询状态用于查询第j个时刻时源语句中编码的部分。
语句向量可以是如图2-6所示的编码模型根据源语句生成的,也可以是其他编码模型根据源语句生成的,本实施例不作限定。其中,源语句是指对应于一种自然语言的语句。
查询状态表示到当前时刻已经编码的历史状态,用于查询源语句,以获取下一时刻源语句中最可能编码的部分,该部分可以是字、词、短语、不连续的片段等等,本实施例不作限定。
步骤702,根据语句向量和第j个查询状态生成第j个源语言关注上下文,该第j个源语言关注上下文是第j个时刻时源语句中编码的部分。
其中,解码模型可以利用注意力运算对语句向量和第j个查询状态生成第j个源语言关注上下文,该源语言关注上下文是当前时刻源语句中最可能编码的 部分,详见下文中的描述。
步骤703,利用处理节点中的第一单元对第j个查询状态和第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给处理节点中的至少一个第二单元进行处理,得到第j个词汇。
当第一单元是L-GRU,第二单元是T-GRU时,处理节点中的L-GRU可以根据上面介绍的计算公式对第j个查询状态和第j个语言关注上下文进行线性运算和非线性运算,并将得到的第j个运算结果输出给该处理节点中的第一个T-GRU,该处理节点中的第一个T-GRU对接收到的数据进行处理后输出给该处理节点中的第二个T-GRU,依此类推,直至该处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到第j个词汇,该第j个词汇即为按照从前往后的顺序排列在目标语句中的第j个位置的词汇。
需要说明的是,假设j≤k,则解码模型可以执行步骤701-703,在得到一个词汇后,将j更新为j+1,继续执行步骤701-703以得到下一个词汇,依此类推,直至将j更新为k以得到第k个词汇后停止循环,执行步骤704。
步骤704,在得到k个词汇后,根据k个词汇生成目标语句,该目标语句与源语句对应于不同的自然语言。
在解码模型得到k个词汇后,按照每个词汇的生成顺序对该k个词汇进行排序,得到目标语句。例如,解码模型得到的第1个词汇是“The”,第2个词汇是“housing”,第3个词汇是“prices”,第4个词汇是“continued”,第5个词汇是“to”,第6个词汇是“rise”,则目标语句是“The housing prices continued to rise”。
当本实施例的方法应用于机器翻译的应用场景中时,源语句对应的自然语言和目标语句对应的自然语言不同。比如,源语句对应的自然语言是中文,目标语句对应的自然语言是英文;源语句对应的自然语言是法文,目标语句对应的自然语言是英文;源语句对应的自然语言是英文,目标语句对应的自然语言是中文。
当本实施例的方法应用于人机对话或文本自动生成的应用场景中时,源语句对应的自然语言和目标语句对应的自然语言可以相同,也可以不同,本实施例不作限定。
综上所述,本申请实施例提供的语句解码方法,由于处理节点中的第一单元可以对第j个查询状态和第j个源语言关注上下文进行线性运算和非线性运算,所以,在对解码模型进行训练以得到该解码模型的权值时,该第一单元也会对训练数据进行线性运算和非线性运算后输出,这样,在反向传播输出与参 考结果的误差时,该误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致解码模型的权值不准确的问题,从而提高了语句处理的准确性。
请参考图8,其示出了本申请一些实施例提供的语句解码方法的方法流程图,该解码模型包括一个处理节点,当然也可以包括多个处理节点,每个处理节点可以包括级联的一个第一单元和至少一个第二单元。其中,第一单元是具有非线性运算能力和线性运算能力的GRU,如上文所述的L-GRU或对GRU做出其他线性变换改进的GRU;第二单元是T-GRU。该语句解码方法,包括:
步骤801,在第j个时刻,获取语句向量和第j个查询状态,该语句向量是编码模型对待编码的源语句进行编码后得到的,该第j个查询状态用于查询第j个时刻时源语句中编码的部分。
在一些实施例中,解码模型可以通过查询节点获取第j个查询状态,该查询节点与处理节点相连。下面对查询节点的三种实现方式进行介绍。
在第一种实现方式中,查询节点包括一个第一单元和至少一个第二单元,则获取第j个查询状态,包括:利用查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,该第j-1个解码状态是处理节点根据第j-1个运算结果得到的,且该第j-1个解码状态用于确定第j-1个词汇;利用查询节点中的第一单元对第j-1个解码状态和第j-1个词汇进行线性运算和非线性运算,将得到的中间运算结果输出给查询节点中的至少一个第二单元进行处理,得到第j个查询状态。
当第一单元是L-GRU,第二单元是T-GRU时,请参考图9,图9中以阴影框表示L-GRU,以白框表示T-GRU,以虚线框901表示处理节点,以虚线框902表示查询节点。
处理节点中的L-GRU可以根据上面介绍的计算公式对第j-1个解码状态和第j-1个词汇进行线性运算和非线性运算,并将得到的中间运算结果输出给该处理节点中的第一个T-GRU,该处理节点中的第一个T-GRU对接收到的数据进行处理后输出给该处理节点中的第二个T-GRU,依此类推,直至该处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到第j个查询状态。
假设查询节点的深度为l q,处理节点的深度为l d,即查询节点中L-GRU和T-GRU的数量之和为l q,处理节点中L-GRU和T-GRU的数量之和为l d,则中间运算结果为
Figure PCTCN2019121420-appb-000019
第k个T-GRU的运算结果为 s j,k=T-GRU k(s j,k-1),1≤k≤l q。其中,s表示解码状态,y表示目标语句中的词汇。
与查询节点包括一个GRU相比,本申请中不仅加深了查询节点的深度,即在查询节点中增加了T-GRU,从而提高了解码模型的学习能力;还将GRU修改为L-GRU,从而提高了解码模型的权值的准确性,以提高语句处理的准确性。
在第二种实现方式中,查询节点包括一个第一单元,则获取第j个查询状态,包括:利用查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,第j-1个解码状态是处理节点根据第j-1个运算结果得到的,且第j-1个解码状态用于确定第j-1个词汇;利用查询节点中的第一单元对第j-1个解码状态和第j-1个词汇进行线性运算和非线性运算,得到第j个查询状态。
当第一单元是L-GRU,请参考图10,图10中以阴影框表示L-GRU,以白框表示T-GRU,以虚线框901表示处理节点,以虚线框902表示查询节点。
处理节点中的L-GRU可以根据上面介绍的计算公式对第j-1个解码状态和第j-1个词汇进行线性运算和非线性运算,直接得到第j个查询状态。
与查询节点包括一个GRU相比,本申请中将GRU修改为L-GRU提高了解码模型的权值的准确性,以提高语句处理的准确性。
在第三种实现方式中,查询节点包括一个第三单元和至少一个第二单元,则获取第j个查询状态,包括:利用查询节点中的第三单元获取第j-1个解码状态和第j-1个词汇,第j-1个解码状态是处理节点根据第j-1个运算结果得到的,且第j-1个解码状态用于确定第j-1个词汇;利用查询节点中的第三单元对第j-1个解码状态和第j-1个词汇进行非线性运算,将得到的中间运算结果输出给查询节点中的至少一个第二单元进行处理,得到第j个查询状态。
当第三单元是GRU,第二单元是T-GRU时,请参考图11,图11中以黑框表示GRU,以白框表示T-GRU,以虚线框901表示处理节点,以虚线框902表示查询节点。
处理节点中的GRU可以根据上面介绍的计算公式对第j-1个解码状态和第j-1个词汇进行非线性运算,并将得到的中间运算结果输出给该处理节点中的第一个T-GRU,该处理节点中的第一个T-GRU对接收到的数据进行处理后输出给该处理节点中的第二个T-GRU,依此类推,直至该处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到第j个查询状态。
与查询节点包括一个GRU相比,本申请中加深了查询节点的深度,即在查 询节点中增加了T-GRU,从而提高了解码模型的学习能力。
步骤802,当解码模型还包括注意力运算节点时,利用注意力运算节点对语句向量和第j个查询状态进行注意力运算,得到第j个源语言关注上下文,该第j个源语言关注上下文是第j个时刻时源语句中编码的部分。
其中,注意力运算节点分别与编码模型、查询节点和处理节点相连,详见图9-11,图9-11中以虚线框903表示注意力运算节点。
在一些实施例中,本申请中的注意力运算节点可以是多注意力运算模型,也可以是其他注意力模型,例如传统的注意力计算模型、局部和全局注意力模型等等,本实施例不作限定。
以多注意力运算模型为例,则第j个源语言关注上下文
Figure PCTCN2019121420-appb-000020
C是语句向量,v是查询状态。
步骤803,利用第一单元将第j个查询状态与第一差值进行元素积运算,得到第一乘积,第一差值等于预定数值减去第一单元的更新门控。
第一差值等于预定数值减去第一单元的更新门控,预定数值可以是1,也可以是其他数值,本实施例不作限定。
其中,更新门控用于衡量第j个语言关注上下文向量来自于第j个语言关注上下文和第j个查询状态的比例。更新门控的计算公式详见L-GRU中的说明,此处不作赘述。
步骤804,利用第一单元通过线性变换函数对第j个语言关注上下文进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对第j个语言关注上下文和第j个查询状态进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值。
其中,线性变换门控用于控制候选激活函数值包含线性变换函数值。线性变换门控的计算公式详见L-GRU中的说明,此处不作赘述。
步骤805,利用第一单元将更新门控与候选激活函数值进行元素积运算,得到第三乘积。
步骤806,利用第一单元将第一乘积与第三乘积相加,得到第j个运算结果。
需要说明的是,步骤803-807中对数据的处理过程即为处理节点中的L-GRU根据计算公式对数据的处理过程,简单表示为
Figure PCTCN2019121420-appb-000021
此处不作赘述。其中,c是源语言关注上下文,v是查询状态。
步骤807,将得到的第j个运算结果输出给处理节点中的至少一个第二单元进行处理,得到第j个词汇。
处理节点中的L-GRU将得到的第j个运算结果输出给该处理节点中的第一个T-GRU,该处理节点中的第一个T-GRU对接收到的数据进行处理后输出给该处理节点中的第二个T-GRU,依此类推,直至该处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到第j个词汇,该第j个词汇即为按照从前往后的顺序排列在目标语句中的第j个位置的词汇。
若用公式表示,则T-GRU的输出为
Figure PCTCN2019121420-appb-000022
2≤p≤l d+1。
需要说明的第一点是,解码模型生成了第j个解码状态后,还获取第j-1个词汇、第j个解码状态、第j个源语言关注上下文,并根据上述三个数据计算输出向量o j,计算公式为
Figure PCTCN2019121420-appb-000023
Figure PCTCN2019121420-appb-000024
是解码模型的权值,通过训练得到;
Figure PCTCN2019121420-appb-000025
解码模型再通过softmax获取该输出向量o j,根据计算公式
Figure PCTCN2019121420-appb-000026
计算第j个词汇是词汇表中每个词汇的概率,并将最大概率所对应的词汇作为第j个词汇。其中,词汇表是预先设置在解码模型中的。
需要说明的第二点是,假设j≤k,则解码模型可以执行步骤801-807,在得到一个词汇后,将j更新为j+1,继续执行步骤801-807以得到下一个词汇,依此类推,直至将j更新为k以得到第k个词汇后停止循环,执行步骤808。下面对解码模型在每个时刻得到一个词汇的流程进行介绍。其中,本实施例中将从查询节点接收数据到softmax(归一化函数)输出数据的时间称为一个时刻,也称为一个时间步。
在第1个时刻,查询节点获取初始的查询状态和词汇,根据计算公式对初始的查询状态和词汇进行处理后得到第1个查询状态v 1;注意力运算节点获取语句向量和v 1,根据计算公式对该语句向量和v 1进行处理后得到第1个源语言关注上下文c 1;处理节点中的L-GRU获取v 1和c 1,根据计算公式对该v 1和c 1进行线性运算和非线性运算后输出给该处理节点中的第一个T-GRU;该处理节点中的第一个T-GRU根据计算公式对接收到的数据进行处理后输出该处理节点中的第二个T-GRU,依此类推,直至该处理节点中的最后一个T-GRU根据计算公式对接收到的数据进行处理后得到第1个解码状态s 1;sofrmax获取c 1和s 1, 根据计算公式对c 1和s 1进行处理后得到第1个词汇y 1
在第2个时刻,查询节点获取v 1和y 1,根据计算公式对该v 1和y 1进行处理后得到第2个查询状态v 2;后续处理流程与第1个时刻的处理流程相同,最终得到第2个词汇y 2
依此类推,解码模型可以得到第k个词汇y k,则解码模型最终得到的语句为y 1y 2…y i…y k
步骤808,在得到k个词汇后,根据k个词汇生成目标语句。
在解码模型得到k个词汇后,按照每个词汇的生成顺序对该k个词汇进行排序,得到目标语句。例如,解码模型得到的第1个词汇是“The”,第2个词汇是“housing”,第3个词汇是“prices”,第4个词汇是“continued”,第5个词汇是“to”,第6个词汇是“rise”,则目标语句是“The housing prices continued to rise”。
当本实施例的方法应用于机器翻译的应用场景中时,源语句对应的自然语言和目标语句对应的自然语言不同。比如,源语句对应的自然语言是中文,目标语句对应的自然语言是英文;源语句对应的自然语言是法文,目标语句对应的自然语言是英文;源语句对应的自然语言是英文,目标语句对应的自然语言是中文。当本实施例的方法应用于人机对话或文本自动生成的应用场景中时,源语句对应的自然语言和目标语句对应的自然语言可以相同,也可以不同,本实施例不作限定。
综上所述,本申请实施例提供的语句解码方法,由于处理节点中的第一单元可以对第j个查询状态和第j个源语言关注上下文进行线性运算和非线性运算,所以,在对解码模型进行训练以得到该解码模型的权值时,该第一单元也会对训练数据进行线性运算和非线性运算后输出,这样,在反向传播输出与参考结果的误差时,该误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致解码模型的权值不准确的问题,从而提高了语句处理的准确性。
与查询节点包括一个GRU相比,本申请中加深了查询节点的深度,加深查询节点的深度可以采用以下方式的至少一种:在查询节点中增加了T-GRU,从而提高了解码模型的学习能力;或将GRU修改为L-GRU,从而提高了解码模型的权值的准确性,以提高语句处理的准确性。
需要说明的是,本申请中不仅通过双曲正切函数对数据进行非线性运算,以保证机器学习模型的学习能力,还通过线性变换函数对数据进行线性运算, 这样,反向传播的误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致编码模型的权值不准确的问题,从而提高了语句处理的准确性。
在一些实施例中,本申请中还可以将上述编码模型和解码模型进行结合,以得到一个包含编码和解码能力的机器学习模型,即将图2、4和5中任一编码模型和图9-11中任一解码模型相结合。请参考图12,图12中以编码模型为双向编码模型,且解码模型中的查询器包括一个第一单元和至少一个第二单元为例进行举例说明。
仍然以源语句为“房价持续增长”,且该机器学习模型应用于机器翻译的应用场景中为例,则图12所示的机器学习模型中的编码模型先对源语句进行分词,得到“房价”、“持续”和“增长”这三个词汇;根据从前往后的编码方向,分别利用前3个处理节点对这三个词汇进行处理,依次得到对应于“房价”的词汇向量1、对应于“持续”的词汇向量2、对应于“增长”的词汇向量3;根据从后往前的编码方向,分别利用后3个处理节点对这三个词汇进行处理,依次得到对应于“增长”的词汇向量4、对应于“持续”的词汇向量5、对应于“房价”的词汇向量6,则得到的语句向量为[词汇向量1词汇向量6、词汇向量2词汇向量5、词汇向量3词汇向量4],将该语句向量输出给解码模型。
解码模型利用上述解码方法对该语句向量进行解码,在第一次解码时得到词汇“The”,在第二次解码时得到词汇“housing”,在第三次解码时得到词汇“prices”,在第四次解码时得到词汇“continued”,在第五次解码时得到词汇“to”,在第六次解码时得到词汇“rise”,则目标语句是“The housing prices continued to rise”。
当机器学习模型包括上述编码模型和解码模型时,编码模型的处理节点中T-GRU的数量、解码模型的处理节点中T-GRU的数量、解码模型的查询节点中T-GRU的数量可以相等,也可以不等。
下面以上述三种节点中T-GRU的数量相等,且T-GRU的数量分别是1和4为例,对机器翻译的BLEU指标进行评测,相关评测数据如下表一所示。其中,BLEU指标用于评测机器翻译的效果,且BLEU指标越高,机器翻译的效果越好。
表一
Figure PCTCN2019121420-appb-000027
Figure PCTCN2019121420-appb-000028
表一中BLEU指标的括号中标注了本申请相对于标准的机器学习模型的增值,通常,增值超过1即可以认为机器翻译的效果提升显著,所以,本申请可以显著提升机器翻译的效果。
下面以上述三种节点均为GRU+1T-GRU或L-GRU+1T-GRU或GRU+4T-GRU或L-GRU+4T-GRU为例,对机器翻译的BLEU指标进行评测,相关评测数据如下表二所示。
表二
机器学习模型 中文-英文
GRU+1T-GRU 43.63
L-GRU+1T-GRU 44.41
GRU+4T-GRU 44.16
L-GRU+4T-GRU 45.04
对表二中的BLEU指标进行分析可知:
1)当三个节点均为L-GRU+1T-GRU时,相比于三个节点均为GRU+1T-GRU来说,BLEU指标增加了44.41-43.63=0.78;当三个节点均为L-GRU+4T-GRU时,相比于三个节点均为GRU+4T-GRU来说,BLEU指标增加了45.04-44.16=0.88。所以,将节点中的GRU修改为L-GRU可以提高机器翻译的准确性。
2)当三个节点均为GRU+1T-GRU时,相比于三个节点均为GRU+4T-GRU来说,BLEU指标增加了44.16-43.63=0.53;当三个节点均为L-GRU+1T-GRU时,相比于三个节点均为L-GRU+4T-GRU来说,BLEU指标增加了45.04-44.41=0.63。所以,增加节点中T-GRU的数量可以提高机器翻译的准确性。
下面以上述三种节点为L-GRU+4T-GRU和GRU中的一种,对机器翻译的BLEU指标进行评测,相关评测数据如下表三所示。其中,√表示其对应的节点为L-GRU+4T-GRU,×表示其对应的节点为GRU。
表三
Figure PCTCN2019121420-appb-000029
Figure PCTCN2019121420-appb-000030
根据表三中的BLEU指标可知,当机器学习模型中三个节点均为L-GRU+4T-GRU时,其BLEU指标最高。所以,三个节点均为L-GRU+4T-GRU的机器学习模型的机器翻译效果最好。
应该理解的是,本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
请参考图13,其示出了本申请一些实施例提供的语句处理装置的结构框图,用于编码模型中,该编码模型包括级联的n个处理节点,该处理节点包括级联的一个第一单元和至少一个第二单元,n≥2。该语句处理装置,包括:
分词模块1310,用于对待编码的源语句进行分词运算,得到m个词汇,m≤n;
获取模块1320,用于利用n个处理节点中的第i个处理节点获取m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,第i-1个词汇向量是m个词汇中的第i-1个词汇的编码向量,i≤m;
运算模块1330,用于利用第i个处理节点中的第一单元对第i个词汇和第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给至少一个第二单元进行处理,得到第i个词汇向量;
生成模块1340,用于在得到m个词汇向量后,根据m个词汇向量生成语句向量,语句向量用于确定目标语句或目标分类。
在一种可能的实现方式中,当编码模型是单向编码模型,且编码方向为从前往后时,
第i个处理节点是按照从前往后的顺序排列在n个处理节点中的第i个位置的处理节点;
第i个词汇是按照从前往后的顺序排列在m个词汇中的第i个位置的词汇。
在一种可能的实现方式中,当编码模型是单向编码模型,且编码方向为从后往前时,
第i个处理节点是按照从后往前的顺序排列在n个处理节点中的第i个位置的处理节点;
第i个词汇是按照从后往前的顺序排列在m个词汇中的第i个位置的词汇。
在一种可能的实现方式中,当编码模型是双向编码模型,且编码方向为从前往后和从后往前时,m≤n/2;
第i个处理节点包括按照从前往后的顺序排列在n个处理节点中的第i个位置的处理节点,以及按照从后往前的顺序排列在n个处理节点中的第i个位置的处理节点;
第i个词汇包括按照从前往后的顺序排列在m个词汇中的第i个位置的词汇,以及按照从后往前的顺序排列在m个词汇中的第i个位置的词汇。
在一种可能的实现方式中,运算模块1330,还用于:
利用第一单元将第i-1个词汇向量与第一差值进行元素积运算,得到第一乘积,第一差值等于预定数值减去第一单元的更新门控,更新门控用于衡量第i个词汇向量来自于第i个词汇和第i-1个词汇向量的比例;
利用第一单元通过线性变换函数对第i个词汇进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对第i个词汇和第i-1个词汇向量进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值,线性变换门控用于控制候选激活函数值包含线性变换函数值;
利用第一单元将更新门控与候选激活函数值进行元素积运算,得到第三乘积;
利用第一单元将第一乘积与第三乘积相加,得到第i个运算结果。
综上所述,本申请实施例提供的语句处理装置,由于处理节点中的第一单元可以对第i个词汇和第i-1个词汇向量进行线性运算和非线性运算,所以,在对编码模型进行训练以得到该编码模型的权值时,该第一单元也会对训练数据进行线性运算和非线性运算后输出,这样,在反向传播输出与参考结果的误差时,该误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致编码模型的权值不准确的问题, 从而提高了语句处理的准确性。
请参考图14,其示出了本申请再一实施例提供的语句解码装置的结构框图,用于解码模型中,该解码模型包括一个处理节点,该处理节点包括级联的一个第一单元和至少一个第二单元。该语句解码装置,包括:
获取模块1410,用于在第j个时刻,获取语句向量和第j个查询状态,语句向量是编码模型对待编码的源语句进行编码后得到的,第j个查询状态用于查询第j个时刻时源语句中编码的部分;
生成模块1420,用于根据语句向量和第j个查询状态生成第j个源语言关注上下文,第j个源语言关注上下文用于指示第j个时刻时源语句中编码的部分;
运算模块1430,用于利用处理节点中的第一单元对第j个查询状态和第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给处理节点中的至少一个第二单元进行处理,得到第j个词汇;
生成模块1420,还用于在得到k个词汇后,根据k个词汇生成目标语句,j≤k。
在一种可能的实现方式中,解码模型还包括与处理节点相连的查询节点,查询节点包括一个第一单元;获取模块1410,还用于:
利用查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,第j-1个解码状态是处理节点根据第j-1个运算结果得到的,且第j-1个解码状态用于确定第j-1个词汇;
利用查询节点中的第一单元对第j-1个解码状态和第j-1个词汇进行线性运算和非线性运算,得到第j个查询状态。
在一种可能的实现方式中,解码模型还包括与处理节点相连的查询节点,查询节点包括一个第一单元和至少一个第二单元;获取模块1410,还用于:
利用查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,第j-1个解码状态是处理节点根据第j-1个运算结果得到的,且第j-1个解码状态用于确定第j-1个词汇;
利用查询节点中的第一单元对第j-1个解码状态和第j-1个词汇进行线性运算和非线性运算,将得到的中间运算结果输出给查询节点中的至少一个第二单元进行处理,得到第j个查询状态。
在一种可能的实现方式中,解码模型还包括与处理节点相连的查询节点,查询节点包括一个第三单元和至少一个第二单元;获取模块1410,还用于:
利用查询节点中的第三单元获取第j-1个解码状态和第j-1个词汇,第j-1 个解码状态是处理节点根据第j-1个运算结果得到的,且第j-1个解码状态用于确定第j-1个词汇;
利用查询节点中的第三单元对第j-1个解码状态和第j-1个词汇进行非线性运算,将得到的中间运算结果输出给查询节点中的至少一个第二单元进行处理,得到第j个查询状态。
在一种可能的实现方式中,解码模型还包括注意力运算节点,注意力运算节点分别与编码模型、查询节点和处理节点相连;生成模块1420,还用于:
利用注意力运算节点对语句向量和第j个查询状态进行注意力运算,得到第j个源语言关注上下文。
在一种可能的实现方式中,运算模块1430,还用于:
利用第一单元将第j个查询状态与第一差值进行元素积运算,得到第一乘积,第一差值等于预定数值减去第一单元的更新门控,更新门控用于衡量第j个语言关注上下文向量来自于第j个语言关注上下文和第j个查询状态的比例;
利用第一单元通过线性变换函数对第j个语言关注上下文进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对第j个语言关注上下文和第j个查询状态进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值,线性变换门控用于控制候选激活函数值包含线性变换函数值;
利用第一单元将更新门控与候选激活函数值进行元素积运算,得到第三乘积;
利用第一单元将第一乘积与第三乘积相加,得到第j个运算结果。
综上所述,本申请实施例提供的语句解码装置,由于处理节点中的第一单元可以对第j个查询状态和第j个源语言关注上下文进行线性运算和非线性运算,所以,在对解码模型进行训练以得到该解码模型的权值时,该第一单元也会对训练数据进行线性运算和非线性运算后输出,这样,在反向传播输出与参考结果的误差时,该误差中包括线性运算部分和非线性运算部分的误差,由于线性运算部分的误差的梯度是常量,可以减缓整个误差的梯度的下降速度,也就改善了因为整个误差的梯度呈指数级下降直至消失时,导致解码模型的权值不准确的问题,从而提高了语句处理的准确性。
本申请还提供了一种服务器,该服务器包括处理器和存储器,存储器中存储有至少一条指令,至少一条指令由处理器加载并执行以实现上述各个方法实施例提供的语句处理方法或语句解码方法。需要说明的是,该服务器可以是如 下图15所提供的服务器。
请参考图15,其示出了本申请一个示例性实施例提供的服务器的结构示意图。具体来讲:所述服务器1500包括中央处理单元(CPU)1501、包括随机存取存储器(RAM)1502和只读存储器(ROM)1503的系统存储器1504,以及连接系统存储器1504和中央处理单元1501的系统总线1505。所述服务器1500还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)1506,和用于存储操作系统1513、应用程序1514和其他程序模块1515的大容量存储设备1507。
所述基本输入/输出系统1506包括有用于显示信息的显示器1508和用于用户输入信息的诸如鼠标、键盘之类的输入设备1509。其中所述显示器1508和输入设备1509都通过连接到系统总线1505的输入输出控制器1510连接到中央处理单元1501。所述基本输入/输出系统1506还可以包括输入输出控制器1510以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1510还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1507通过连接到系统总线1505的大容量存储控制器(未示出)连接到中央处理单元1501。所述大容量存储设备1507及其相关联的计算机可读存储介质为服务器1500提供非易失性存储。也就是说,所述大容量存储设备1507可以包括诸如硬盘或者CD-ROI驱动器之类的计算机可读存储介质(未示出)。
不失一般性,所述计算机可读存储介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1504和大容量存储设备1507可以统称为存储器。
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元1501执行,一个或多个程序包含用于实现上述语句编码或语句解码方法的指令,中央处理单元1501执行该一个或多个程序实现上述各个方法实施例提供的语句处理方法或语句解码方法。
根据本发明的各种实施例,所述服务器1500还可以通过诸如因特网等网络 连接到网络上的远程计算机运行。也即服务器1500可以通过连接在所述系统总线1505上的网络接口单元1511连接到网络1512,或者说,也可以使用网络接口单元1511来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本发明实施例提供的语句处理方法或语句解码方法中由服务器所执行的步骤。
本申请实施例还提供一种计算机可读存储介质,该存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器1510加载并执行以实现如上所述的语句处理方法或语句解码方法。
本申请还提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述各个方法实施例提供的语句处理方法或语句解码方法。
本申请一些实施例提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上所述的语句处理方法或语句解码方法。
需要说明的是:上述实施例提供的语句编解码装置在进行语句编码或语句解码时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将语句编解码装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的语句编解码装置与语句编解码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过硬件来完成,可以通过计算机程序来指令相关的硬件来完成,计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、 增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述并不用以限制本申请实施例,凡在本申请实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。

Claims (27)

  1. 一种语句处理方法,由语句处理设备执行,用于编码模型中,所述编码模型包括级联的n个处理节点,所述处理节点包括级联的第一单元和至少一个第二单元,n≥2;包括:
    对待编码的源语句进行分词运算,得到m个词汇,m≤n;
    利用所述n个处理节点中的第i个处理节点获取所述m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,所述第i-1个词汇向量是所述m个词汇中的第i-1个词汇的编码向量,i≤m;
    利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给所述至少一个第二单元进行处理,得到第i个词汇向量;及
    在得到m个词汇向量后,根据所述m个词汇向量生成语句向量,所述语句向量用于确定目标语句或目标分类。
  2. 根据权利要求1所述的方法,其特征在于,当所述编码模型是单向编码模型,且编码方向为从前往后时,
    所述第i个处理节点是按照从前往后的顺序排列在所述n个处理节点中的第i个位置的处理节点;及
    所述第i个词汇是按照从前往后的顺序排列在所述m个词汇中的第i个位置的词汇。
  3. 根据权利要求1所述的方法,其特征在于,当所述编码模型是单向编码模型,且编码方向为从后往前时,
    所述第i个处理节点是按照从后往前的顺序排列在所述n个处理节点中的第i个位置的处理节点;及
    所述第i个词汇是按照从后往前的顺序排列在所述m个词汇中的第i个位置的词汇。
  4. 根据权利要求1所述的方法,其特征在于,当所述编码模型是双向编码模型,且编码方向包括从前往后和从后往前时,m≤n/2;
    所述第i个处理节点包括按照从前往后的顺序排列在所述n个处理节点中的第i个位置的处理节点,以及按照从后往前的顺序排列在所述n个处理节点中的第i个位置的处理节点;及
    所述第i个词汇包括按照从前往后的顺序排列在所述m个词汇中的第i个位置的词汇,以及按照从后往前的顺序排列在所述m个词汇中的第i个位置的词汇。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,包括:
    利用所述第一单元将所述第i-1个词汇向量与第一差值进行元素积运算,得到第一乘积,所述第一差值等于预定数值减去所述第一单元的更新门控,所述更新门控用于衡量所述第i个词汇向量来自于所述第i个词汇和所述第i-1个词汇向量的比例;
    利用所述第一单元通过线性变换函数对所述第i个词汇进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对所述第i个词汇和所述第i-1个词汇向量进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值,所述线性变换门控用于控制所述候选激活函数值包含所述线性变换函数值;
    利用所述第一单元将所述更新门控与所述候选激活函数值进行元素积运算,得到第三乘积;及
    利用所述第一单元将所述第一乘积与所述第三乘积相加,得到所述第i个运算结果。
  6. 根据权利要求1至4任一所述的方法,其特征在于,所述方法还包括利用解码模型进行解码的步骤,所述解码模型包括处理节点,所述处理节点包括级联的第一单元和至少一个第二单元,解码步骤包括:
    在第j个时刻,获取所述语句向量和第j个查询状态,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
    根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
    利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
    在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
  7. 一种语句解码方法,由语句处理设备执行,用于解码模型中,所述解码 模型包括处理节点,所述处理节点包括级联的第一单元和至少一个第二单元;所述方法包括:
    在第j个时刻,获取语句向量和第j个查询状态,所述语句向量是编码模型对待编码的源语句进行编码后得到的,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
    根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
    利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
    在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
  8. 根据权利要求7所述的方法,其特征在于,所述解码模型还包括与所述处理节点相连的查询节点,所述查询节点包括第一单元;所述获取第j个查询状态,包括:
    利用所述查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,所述第j-1个解码状态是所述处理节点根据第j-1个运算结果得到的,且所述第j-1个解码状态用于确定所述第j-1个词汇;及
    利用所述查询节点中的第一单元对所述第j-1个解码状态和所述第j-1个词汇进行线性运算和非线性运算,得到所述第j个查询状态。
  9. 根据权利要求7所述的方法,其特征在于,所述解码模型还包括与所述处理节点相连的查询节点,所述查询节点包括第一单元和至少一个第二单元;所述获取第j个查询状态,包括:
    利用所述查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,所述第j-1个解码状态是所述处理节点根据第j-1个运算结果得到的,且所述第j-1个解码状态用于确定所述第j-1个词汇;及
    利用所述查询节点中的第一单元对所述第j-1个解码状态和所述第j-1个词汇进行线性运算和非线性运算,将得到的中间运算结果输出给所述查询节点中的至少一个第二单元进行处理,得到所述第j个查询状态。
  10. 根据权利要求7所述的方法,其特征在于,所述解码模型还包括与所述处理节点相连的查询节点,所述查询节点包括第三单元和至少一个第二单元;所 述获取第j个查询状态,包括:
    利用所述查询节点中的第三单元获取第j-1个解码状态和第j-1个词汇,所述第j-1个解码状态是所述处理节点根据第j-1个运算结果得到的,且所述第j-1个解码状态用于确定所述第j-1个词汇;及
    利用所述查询节点中的第三单元对所述第j-1个解码状态和所述第j-1个词汇进行非线性运算,将得到的中间运算结果输出给所述查询节点中的至少一个第二单元进行处理,得到所述第j个查询状态。
  11. 根据权利要求8至10任一所述的方法,其特征在于,所述解码模型还包括注意力运算节点,所述注意力运算节点分别与所述编码模型、所述查询节点和所述处理节点相连;所述根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,包括:
    利用所述注意力运算节点对所述语句向量和所述第j个查询状态进行注意力运算,得到所述第j个源语言关注上下文。
  12. 根据权利要求7所述的方法,其特征在于,所述利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,包括:
    利用所述第一单元将所述第j个查询状态与第一差值进行元素积运算,得到第一乘积,所述第一差值等于预定数值减去所述第一单元的更新门控,所述更新门控用于衡量所述第j个语言关注上下文向量来自于所述第j个语言关注上下文和所述第j个查询状态的比例;
    利用所述第一单元通过线性变换函数对所述第j个语言关注上下文进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对所述第j个语言关注上下文和所述第j个查询状态进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值,所述线性变换门控用于控制所述候选激活函数值包含所述线性变换函数值;
    利用所述第一单元将所述更新门控与所述候选激活函数值进行元素积运算,得到第三乘积;及
    利用所述第一单元将所述第一乘积与所述第三乘积相加,得到所述第j个运算结果。
  13. 一种语句处理装置,用于编码模型中,所述编码模型包括级联的n个处 理节点,所述处理节点包括级联的第一单元和至少一个第二单元,n≥2;包括:
    分词模块,用于对待编码的源语句进行分词运算,得到m个词汇,m≤n;
    获取模块,用于利用所述n个处理节点中的第i个处理节点获取所述m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,所述第i-1个词汇向量是所述m个词汇中的第i-1个词汇的编码向量,i≤m;
    运算模块,用于利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给所述至少一个第二单元进行处理,得到第i个词汇向量;及
    生成模块,用于在得到m个词汇向量后,根据所述m个词汇向量生成语句向量,所述语句向量用于确定目标语句或目标分类。
  14. 一种语句解码装置,用于解码模型中,所述解码模型包括一个处理节点,所述处理节点包括级联的第一单元和至少一个第二单元;包括:
    获取模块,用于在第j个时刻,获取语句向量和第j个查询状态,所述语句向量是编码模型对待编码的源语句进行编码后得到的,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
    生成模块,用于根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
    运算模块,用于利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
    所述生成模块,还用于在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
  15. 一种语句处理设备,包括存储器和处理器,所述存储器中存储有计算机可读指令以及编码模型,所述编码模型包括级联的n个处理节点,所述处理节点包括级联的第一单元和至少一个第二单元,n≥2,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    对待编码的源语句进行分词运算,得到m个词汇,m≤n;
    利用所述n个处理节点中的第i个处理节点获取所述m个词汇中的第i个词汇,并获取第i-1个处理节点得到的第i-1个词汇向量,所述第i-1个词汇向量是 所述m个词汇中的第i-1个词汇的编码向量,i≤m;
    利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,将得到的第i个运算结果输出给所述至少一个第二单元进行处理,得到第i个词汇向量;及
    在得到m个词汇向量后,根据所述m个词汇向量生成语句向量,所述语句向量用于确定目标语句或目标分类。
  16. 根据权利要求15所述的设备,其特征在于,当所述编码模型是单向编码模型,且编码方向为从前往后时,
    所述第i个处理节点是按照从前往后的顺序排列在所述n个处理节点中的第i个位置的处理节点;及
    所述第i个词汇是按照从前往后的顺序排列在所述m个词汇中的第i个位置的词汇。
  17. 根据权利要求15所述的设备,其特征在于,当所述编码模型是单向编码模型,且编码方向为从后往前时,
    所述第i个处理节点是按照从后往前的顺序排列在所述n个处理节点中的第i个位置的处理节点;及
    所述第i个词汇是按照从后往前的顺序排列在所述m个词汇中的第i个位置的词汇。
  18. 根据权利要求15所述的设备,其特征在于,当所述编码模型是双向编码模型,且编码方向包括从前往后和从后往前时,m≤n/2;
    所述第i个处理节点包括按照从前往后的顺序排列在所述n个处理节点中的第i个位置的处理节点,以及按照从后往前的顺序排列在所述n个处理节点中的第i个位置的处理节点;及
    所述第i个词汇包括按照从前往后的顺序排列在所述m个词汇中的第i个位置的词汇,以及按照从后往前的顺序排列在所述m个词汇中的第i个位置的词汇。
  19. 根据权利要求15至18任一所述的设备,其特征在于,所述利用所述第i个处理节点中的第一单元对所述第i个词汇和所述第i-1个词汇向量进行线性运算和非线性运算,包括:
    利用所述第一单元将所述第i-1个词汇向量与第一差值进行元素积运算,得到第一乘积,所述第一差值等于预定数值减去所述第一单元的更新门控,所述更 新门控用于衡量所述第i个词汇向量来自于所述第i个词汇和所述第i-1个词汇向量的比例;
    利用所述第一单元通过线性变换函数对所述第i个词汇进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对所述第i个词汇和所述第i-1个词汇向量进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值,所述线性变换门控用于控制所述候选激活函数值包含所述线性变换函数值;
    利用所述第一单元将所述更新门控与所述候选激活函数值进行元素积运算,得到第三乘积;及
    利用所述第一单元将所述第一乘积与所述第三乘积相加,得到所述第i个运算结果。
  20. 根据权利要求15至18任一所述的设备,其特征在于,所述存储器还包括解码模型,所述解码模型包括处理节点,所述处理节点包括级联的第一单元和至少一个第二单元,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    在第j个时刻,获取所述语句向量和第j个查询状态,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
    根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
    利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
    在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
  21. 一种语句处理设备,包括存储器和处理器,所述存储器中存储有计算机可读指令以及解码模型,所述解码模型包括处理节点,所述处理节点包括级联的第一单元和至少一个第二单元;所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    在第j个时刻,获取语句向量和第j个查询状态,所述语句向量是编码模型对待编码的源语句进行编码后得到的,所述第j个查询状态用于查询第j个时刻时所述源语句中编码的部分;
    根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,所述第j个源语言关注上下文是第j个时刻时所述源语句中编码的部分;
    利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,将得到的第j个运算结果输出给所述处理节点中的至少一个第二单元进行处理,得到第j个词汇;及
    在得到k个词汇后,根据所述k个词汇生成目标语句,j≤k。
  22. 根据权利要求21所述的设备,其特征在于,所述解码模型还包括与所述处理节点相连的查询节点,所述查询节点包括第一单元;所述获取第j个查询状态,包括:
    利用所述查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,所述第j-1个解码状态是所述处理节点根据第j-1个运算结果得到的,且所述第j-1个解码状态用于确定所述第j-1个词汇;及
    利用所述查询节点中的第一单元对所述第j-1个解码状态和所述第j-1个词汇进行线性运算和非线性运算,得到所述第j个查询状态。
  23. 根据权利要求21所述的设备,其特征在于,所述解码模型还包括与所述处理节点相连的查询节点,所述查询节点包括第一单元和至少一个第二单元;所述获取第j个查询状态,包括:
    利用所述查询节点中的第一单元获取第j-1个解码状态和第j-1个词汇,所述第j-1个解码状态是所述处理节点根据第j-1个运算结果得到的,且所述第j-1个解码状态用于确定所述第j-1个词汇;及
    利用所述查询节点中的第一单元对所述第j-1个解码状态和所述第j-1个词汇进行线性运算和非线性运算,将得到的中间运算结果输出给所述查询节点中的至少一个第二单元进行处理,得到所述第j个查询状态。
  24. 根据权利要求21所述的设备,其特征在于,所述解码模型还包括与所述处理节点相连的查询节点,所述查询节点包括第三单元和至少一个第二单元;所述获取第j个查询状态,包括:
    利用所述查询节点中的第三单元获取第j-1个解码状态和第j-1个词汇,所述第j-1个解码状态是所述处理节点根据第j-1个运算结果得到的,且所述第j-1个解码状态用于确定所述第j-1个词汇;及
    利用所述查询节点中的第三单元对所述第j-1个解码状态和所述第j-1个词汇 进行非线性运算,将得到的中间运算结果输出给所述查询节点中的至少一个第二单元进行处理,得到所述第j个查询状态。
  25. 根据权利要求22至24任一所述的设备,其特征在于,所述解码模型还包括注意力运算节点,所述注意力运算节点分别与所述编码模型、所述查询节点和所述处理节点相连;所述根据所述语句向量和所述第j个查询状态生成第j个源语言关注上下文,包括:
    利用所述注意力运算节点对所述语句向量和所述第j个查询状态进行注意力运算,得到所述第j个源语言关注上下文。
  26. 根据权利要求21所述的设备,其特征在于,所述利用所述处理节点中的第一单元对所述第j个查询状态和所述第j个语言关注上下文进行线性运算和非线性运算,包括:
    利用所述第一单元将所述第j个查询状态与第一差值进行元素积运算,得到第一乘积,所述第一差值等于预定数值减去所述第一单元的更新门控,所述更新门控用于衡量所述第j个语言关注上下文向量来自于所述第j个语言关注上下文和所述第j个查询状态的比例;
    利用所述第一单元通过线性变换函数对所述第j个语言关注上下文进行线性变换,将得到的线性变换函数值与线性变换门控进行元素积运算,得到第二乘积;通过双曲正切函数对所述第j个语言关注上下文和所述第j个查询状态进行非线性变换,将得到的双曲正切函数值与第二乘积相加,得到候选激活函数值,所述线性变换门控用于控制所述候选激活函数值包含所述线性变换函数值;
    利用所述第一单元将所述更新门控与所述候选激活函数值进行元素积运算,得到第三乘积;及
    利用所述第一单元将所述第一乘积与所述第三乘积相加,得到所述第j个运算结果。
  27. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至6任一项所述的方法,或者执行如权利要求7或12所述的方法中的至少一种方法。
PCT/CN2019/121420 2018-11-29 2019-11-28 语句处理方法、语句解码方法、装置、存储介质及设备 WO2020108545A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021516821A JP7229345B2 (ja) 2018-11-29 2019-11-28 文処理方法、文復号方法、装置、プログラム及び機器
US17/181,490 US20210174003A1 (en) 2018-11-29 2021-02-22 Sentence encoding and decoding method, storage medium, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811444710.8A CN110263304B (zh) 2018-11-29 2018-11-29 语句编码方法、语句解码方法、装置、存储介质及设备
CN201811444710.8 2018-11-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/181,490 Continuation US20210174003A1 (en) 2018-11-29 2021-02-22 Sentence encoding and decoding method, storage medium, and device

Publications (1)

Publication Number Publication Date
WO2020108545A1 true WO2020108545A1 (zh) 2020-06-04

Family

ID=67911885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121420 WO2020108545A1 (zh) 2018-11-29 2019-11-28 语句处理方法、语句解码方法、装置、存储介质及设备

Country Status (4)

Country Link
US (1) US20210174003A1 (zh)
JP (1) JP7229345B2 (zh)
CN (1) CN110263304B (zh)
WO (1) WO2020108545A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560413A (zh) * 2020-12-15 2021-03-26 中国人寿保险股份有限公司 基于配置模式的报表扩展方法、装置和设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263304B (zh) * 2018-11-29 2023-01-10 腾讯科技(深圳)有限公司 语句编码方法、语句解码方法、装置、存储介质及设备
CN112309405A (zh) * 2020-10-29 2021-02-02 平安科技(深圳)有限公司 多种声音事件的检测方法、装置、计算机设备及存储介质
CN113705652B (zh) * 2021-08-23 2024-05-28 西安交通大学 一种基于指针生成网络的任务型对话状态追踪系统及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210218A1 (en) * 2008-02-07 2009-08-20 Nec Laboratories America, Inc. Deep Neural Networks and Methods for Using Same
CN107391501A (zh) * 2017-09-11 2017-11-24 南京大学 一种基于词预测的神经机器翻译方法
CN107729329A (zh) * 2017-11-08 2018-02-23 苏州大学 一种基于词向量连接技术的神经机器翻译方法及装置
CN110263304A (zh) * 2018-11-29 2019-09-20 腾讯科技(深圳)有限公司 语句编码方法、语句解码方法、装置、存储介质及设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101682207B1 (ko) * 2010-08-23 2016-12-12 에스케이플래닛 주식회사 토큰 분리 및 번역 과정을 통합한 통합 디코딩 장치 및 그 방법
US10867597B2 (en) * 2013-09-02 2020-12-15 Microsoft Technology Licensing, Llc Assignment of semantic labels to a sequence of words using neural network architectures
US10606846B2 (en) 2015-10-16 2020-03-31 Baidu Usa Llc Systems and methods for human inspired simple question answering (HISQA)
CN107220231A (zh) * 2016-03-22 2017-09-29 索尼公司 用于自然语言处理的电子设备和方法以及训练方法
JP6633999B2 (ja) 2016-10-31 2020-01-22 日本電信電話株式会社 符号器学習装置、変換装置、方法、及びプログラム
KR20180077847A (ko) * 2016-12-29 2018-07-09 주식회사 엔씨소프트 문장 검증 장치 및 방법
AU2018214675B2 (en) * 2017-02-06 2022-08-04 Thomson Reuters Enterprise Centre Gmbh Systems and methods for automatic semantic token tagging
CN107632981B (zh) * 2017-09-06 2020-11-03 沈阳雅译网络技术有限公司 一种引入源语组块信息编码的神经机器翻译方法
CN108304788B (zh) * 2018-01-18 2022-06-14 陕西炬云信息科技有限公司 基于深度神经网络的人脸识别方法
CN108399230A (zh) * 2018-02-13 2018-08-14 上海大学 一种基于卷积神经网络的中文财经新闻文本分类方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210218A1 (en) * 2008-02-07 2009-08-20 Nec Laboratories America, Inc. Deep Neural Networks and Methods for Using Same
CN107391501A (zh) * 2017-09-11 2017-11-24 南京大学 一种基于词预测的神经机器翻译方法
CN107729329A (zh) * 2017-11-08 2018-02-23 苏州大学 一种基于词向量连接技术的神经机器翻译方法及装置
CN110263304A (zh) * 2018-11-29 2019-09-20 腾讯科技(深圳)有限公司 语句编码方法、语句解码方法、装置、存储介质及设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560413A (zh) * 2020-12-15 2021-03-26 中国人寿保险股份有限公司 基于配置模式的报表扩展方法、装置和设备
CN112560413B (zh) * 2020-12-15 2023-08-04 中国人寿保险股份有限公司 基于配置模式的报表扩展方法、装置和设备

Also Published As

Publication number Publication date
CN110263304A (zh) 2019-09-20
US20210174003A1 (en) 2021-06-10
CN110263304B (zh) 2023-01-10
JP7229345B2 (ja) 2023-02-27
JP2022503812A (ja) 2022-01-12

Similar Documents

Publication Publication Date Title
US20210124878A1 (en) On-Device Projection Neural Networks for Natural Language Understanding
WO2021047286A1 (zh) 文本处理模型的训练方法、文本处理方法及装置
CN112712804B (zh) 语音识别方法、系统、介质、计算机设备、终端及应用
WO2020108545A1 (zh) 语句处理方法、语句解码方法、装置、存储介质及设备
WO2021233112A1 (zh) 基于多模态机器学习的翻译方法、装置、设备及存储介质
WO2022007823A1 (zh) 一种文本数据处理方法及装置
US9830315B1 (en) Sequence-based structured prediction for semantic parsing
WO2021196920A1 (zh) 智能问答方法、装置、设备及计算机可读存储介质
WO2022057776A1 (zh) 一种模型压缩方法及装置
WO2021190259A1 (zh) 一种槽位识别方法及电子设备
Li et al. Chinese grammatical error correction based on convolutional sequence to sequence model
CN111382257A (zh) 一种生成对话下文的方法和系统
CN110781302A (zh) 文本中事件角色的处理方法、装置、设备及存储介质
CN116049387A (zh) 一种基于图卷积的短文本分类方法、装置、介质
CN114445832A (zh) 基于全局语义的文字图像识别方法、装置及计算机设备
Fu et al. A CNN-LSTM network with attention approach for learning universal sentence representation in embedded system
CN108875024B (zh) 文本分类方法、系统、可读存储介质及电子设备
CN111949762B (zh) 基于上下文情感对话的方法和系统、存储介质
CN110888944A (zh) 基于多卷积窗尺寸注意力卷积神经网络实体关系抽取方法
US10706086B1 (en) Collaborative-filtering based user simulation for dialog systems
WO2023137903A1 (zh) 基于粗糙语义的回复语句确定方法、装置及电子设备
CN113420869B (zh) 基于全方向注意力的翻译方法及其相关设备
CN114519353A (zh) 模型的训练方法、情感消息生成方法和装置、设备、介质
Ni et al. Recurrent neural network based language model adaptation for accent mandarin speech
CN113077785A (zh) 一种端到端的多语言连续语音流语音内容识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889101

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021516821

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889101

Country of ref document: EP

Kind code of ref document: A1