WO2021083312A1 - 训练语句复述模型的方法、语句复述方法及其装置 - Google Patents

训练语句复述模型的方法、语句复述方法及其装置 Download PDF

Info

Publication number
WO2021083312A1
WO2021083312A1 PCT/CN2020/125131 CN2020125131W WO2021083312A1 WO 2021083312 A1 WO2021083312 A1 WO 2021083312A1 CN 2020125131 W CN2020125131 W CN 2020125131W WO 2021083312 A1 WO2021083312 A1 WO 2021083312A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
paraphrase
language
training data
training
Prior art date
Application number
PCT/CN2020/125131
Other languages
English (en)
French (fr)
Inventor
郭寅鹏
廖亿
蒋欣
张晴
张轶博
刘群
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021083312A1 publication Critical patent/WO2021083312A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • This application relates to the field of artificial intelligence, and more specifically, to a method of training a sentence retelling model, a sentence retelling method and a device thereof.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Paraphrase refers to different expressions of the same semantics of a sentence. Paraphrase is very common in natural language. In the field of natural language processing (NLP), paraphrase has also been used more and more widely. Therefore, how to obtain paraphrase sentences has become a technical problem that needs to be solved urgently.
  • NLP natural language processing
  • the present application provides a method for training a sentence paraphrase model, a sentence paraphrase method and a device thereof, which can easily obtain paraphrase sentences.
  • a method for training a sentence paraphrase model comprising: obtaining training data, the training data includes a plurality of sentences, the language of the plurality of sentences is different, and the plurality of sentences have the same meaning
  • train a sentence paraphrase model the sentence paraphrase model is used to generate a paraphrase sentence of the input sentence based on the input sentence, the paraphrase sentence and the input sentence have the same meaning, the language of the paraphrase sentence
  • the language of the input sentence is the same or different.
  • the training data includes multiple sentences in different languages and with the same meaning.
  • the training data is directly used to train the sentence paraphrase model, instead of relying on the paraphrase sentence to train the sentence paraphrase model on the corpus.
  • the cost of training the sentence paraphrase model is reduced, so that the paraphrase sentence can be easily obtained.
  • the sentence paraphrase model may be a language model.
  • the sentence paraphrase model may be a Transformer language model, a CNN language model, or an RNN language model, etc., or the sentence paraphrase model may also be another deep learning language model, which is not limited in this embodiment of the application.
  • the training data may include a sentence and one or more translations in different languages corresponding to the sentence.
  • the multiple sentences included in the training data may be sentences in a machine translation corpus.
  • the machine translation corpus mentioned here can be a multilingual parallel corpus, for example, a bilingual parallel (Chinese and English) machine translation corpus.
  • the training data may include a Chinese sentence and an English translation corresponding to the Chinese sentence.
  • the training data may also include language type indication information.
  • the language type indication information may be Chinese type indication information, English type indication information, French type indication information, or the like.
  • the Chinese sentence when the training data includes a sequence composed of a Chinese sentence and an English translation corresponding to the Chinese sentence, in the sequence, the Chinese sentence may include Chinese type indication information before the Chinese sentence, and the Chinese type indication information may indicate the following
  • the sentence is a Chinese sentence, for example, the Chinese type indication information can be " ⁇ zh>".
  • the English sentence (that is, the English translation corresponding to the Chinese sentence) may include English type indication information in front of it, and the English type indication information may indicate that the following sentence is an English sentence.
  • the English type indication information may be " ⁇ en>".
  • the training data may also include sentences in other languages such as French, German, and Spanish.
  • the training data may include a Chinese sentence, a French translation corresponding to the sentence, and a German translation corresponding to the Chinese sentence.
  • the sentence paraphrase model includes a language type indicator parameter, and the language type indicator parameter is used to indicate the language type of the paraphrase sentence generated by the sentence paraphrase model; where The training of the sentence paraphrase model according to the training data includes: training the sentence paraphrase model according to the training data and the language indicator parameter.
  • the sentence paraphrase model includes language indicator parameters. According to the training data and the language indicator parameters, the sentence paraphrase model is trained, so that the language paraphrase model can (according to the language type (Indicating parameters) to generate multiple paraphrase sentences in different languages.
  • the language indicating parameter is a parameter of the sentence paraphrase model.
  • the language type of the paraphrase sentence is determined according to the language type indication parameter
  • the method further includes: acquiring language type indication information, where the language type indication information is used for Determine the language indication parameter; determine the language indication parameter according to the language indication information.
  • the foregoing determination of the language indication parameter according to the language indication information can be understood as: setting the language indication parameter according to the language indication information. Therefore, the language type of the paraphrase sentence generated by the sentence paraphrase model can be controlled according to the set language type indicating parameter.
  • the language indication parameter is set by the language indication information, and the language type of the paraphrase sentence generated by the sentence paraphrase model can be flexibly and conveniently controlled according to the language indication parameter.
  • At least one of the multiple sentences included in the training data is a sentence that has undergone disturbance processing
  • the disturbance processing includes randomly deleting words in the sentence and randomly Reverse the word order of the words in the sentence and insert at least one of the words into the sentence randomly.
  • the scrambled training data is used to train the sentence paraphrase model, which can improve the robustness of the sentence paraphrase model, so that the generated paraphrase sentence
  • the semantics of is more accurate and the forms are more diverse.
  • the disturbance processing may also include other disturbance processing performed on the sentence, which is not limited in the embodiment of the present application.
  • the disturbance processing may be denoising auto-encoder (DAE) technology.
  • DAE denoising auto-encoder
  • the method before the acquiring training data, the method further includes: acquiring pre-training data, where the pre-training data includes one or more sentences, and the pre-training data
  • the language types of the sentences included in the training data are one or more of the language types of the sentences included in the training data; according to the pre-training data, the sentence paraphrase model is trained.
  • using pre-training data including at least one sentence to train the sentence paraphrase model can improve the fluency of the sentence (paraphrase sentence) generated by the sentence paraphrase model.
  • the pre-training data when using the pre-training data to train the sentence paraphrase model, the pre-training data may be used as the input of the sentence paraphrase model, and at the same time, the sentence in the pre-training data may be used as the input of the sentence paraphrase model.
  • the label (that is, the true value) uses a backpropagation algorithm to continuously obtain pre-training data for iterative training until the loss function converges, and then completes the training of the sentence paraphrase model.
  • a sentence paraphrase method which includes: acquiring an input sentence; using a sentence paraphrase model to paraphrase the input sentence to generate a paraphrase sentence of the input sentence, the language type of the paraphrase sentence and the language
  • the language types of the input sentences are the same or different; wherein the sentence paraphrase model is obtained after training using training data, the training data includes multiple sentences, the language types of the multiple sentences are different, and the multiple sentences have The same meaning.
  • the training data includes multiple sentences in different languages and with the same meaning
  • the training sentence paraphrase model is directly obtained after training using the training data, and does not rely on the paraphrase sentence to the corpus to the sentence paraphrase
  • the training of the model can reduce the cost of training the sentence paraphrase model, so that the paraphrase sentence can be obtained conveniently.
  • the sentence paraphrase model may be a language model.
  • the sentence paraphrase model may be a Transformer language model, a CNN language model, or an RNN language model, etc., or the sentence paraphrase model may also be another deep learning language model, which is not limited in this embodiment of the application.
  • the training data may include a sentence and one or more translations in different languages corresponding to the sentence.
  • the multiple sentences included in the training data may be sentences in a machine translation corpus.
  • the machine translation corpus mentioned here can be a multilingual parallel corpus, for example, a bilingual parallel (Chinese and English) machine translation corpus.
  • the training data may include a Chinese sentence and an English translation corresponding to the Chinese sentence.
  • the training data may also include language type indication information.
  • the language type indication information may be Chinese type indication information, English type indication information, or French type indication information.
  • the Chinese sentence when the training data includes a sequence composed of a Chinese sentence and an English translation corresponding to the Chinese sentence, in the sequence, the Chinese sentence may include Chinese type indication information before the Chinese sentence, and the Chinese type indication information may indicate the following
  • the sentence is a Chinese sentence, for example, the Chinese type indication information can be " ⁇ zh>".
  • the English sentence (that is, the English translation corresponding to the Chinese sentence) may include English type indication information in front of it, and the English type indication information may indicate that the following sentence is an English sentence.
  • the English type indication information may be " ⁇ en>".
  • the training data may also include sentences in other languages such as French, German, and Spanish.
  • the training data may include a Chinese sentence, a French translation corresponding to the sentence, and a German translation corresponding to the Chinese sentence.
  • the sentence paraphrase model includes a language type indicator parameter, and the language type indicator parameter is used to indicate the language type of the paraphrase sentence generated by the sentence paraphrase model.
  • the sentence paraphrase model is obtained after training according to the training data and the language indicator parameter.
  • the sentence paraphrase model includes language indicator parameters. According to the training data and the language indicator parameters, the sentence paraphrase model is trained, so that the language paraphrase model can (according to the language type (Indicating parameters) to generate multiple paraphrase sentences in different languages.
  • the language indicating parameter is a parameter of the sentence paraphrase model.
  • the language of the paraphrase sentence is determined according to the language indication parameter, and the language indication parameter is determined according to the acquired language indication information.
  • the foregoing determination of the language indication parameter according to the language indication information can be understood as: setting the language indication parameter according to the language indication information. Therefore, the language type of the paraphrase sentence generated by the sentence paraphrase model can be controlled according to the set language type indicating parameter.
  • the language indication parameter is set by the language indication information, and the language type of the paraphrase sentence generated by the sentence paraphrase model can be flexibly and conveniently controlled according to the language indication parameter.
  • At least one of the multiple sentences included in the training data is a sentence that has undergone disturbance processing
  • the disturbance processing includes randomly deleting words in the sentence and randomly Reverse the word order of the words in the sentence and insert at least one of the words into the sentence randomly.
  • the scrambled training data is used to train the sentence paraphrase model, which can improve the robustness of the sentence paraphrase model, so that the generated paraphrase sentence
  • the semantics of is more accurate and the forms are more diverse.
  • the disturbance processing may also include other disturbance processing performed on the sentence, which is not limited in the embodiment of the present application.
  • the disturbance processing may be denoising auto-encoder (DAE) technology.
  • DAE denoising auto-encoder
  • the sentence paraphrase model is obtained after training using pre-training data and then training using the training data, and the pre-training data includes one or more Sentence, the language of the sentence included in the pre-training data is one or more of the language of the sentence included in the training data.
  • using pre-training data including at least one sentence to train the sentence paraphrase model can improve the fluency of the sentence (paraphrase sentence) generated by the sentence paraphrase model.
  • the pre-training data when using the pre-training data to train the sentence paraphrase model, the pre-training data may be used as the input of the sentence paraphrase model, and at the same time, the sentence in the pre-training data may be used as the input of the sentence paraphrase model.
  • the label (that is, the true value) uses a backpropagation algorithm to continuously obtain pre-training data for iterative training until the loss function converges, and then completes the training of the sentence paraphrase model.
  • an apparatus for training a sentence paraphrase model including: an acquisition module for acquiring training data, the training data includes a plurality of sentences, the language of the plurality of sentences is different, and the plurality of sentences Have the same meaning; a training module for training a sentence paraphrase model based on the training data, the sentence paraphrase model is used to generate a paraphrase sentence of the input sentence based on an input sentence, the paraphrase sentence and the input sentence have the same Meaning, the language of the paraphrase sentence is the same or different from the language of the input sentence.
  • the training data includes multiple sentences with different language types and the same meaning.
  • the training data is used to train the sentence paraphrase model directly, instead of relying on the paraphrase sentence to train the sentence paraphrase model on the corpus.
  • the cost of training the sentence paraphrase model is reduced, so that the paraphrase sentence can be easily obtained.
  • the sentence paraphrase model may be a language model.
  • the sentence paraphrase model may be a Transformer language model, a CNN language model, or an RNN language model, etc., or the sentence paraphrase model may also be another deep learning language model, which is not limited in this embodiment of the application.
  • the training data may include a sentence and one or more translations in different languages corresponding to the sentence.
  • the multiple sentences included in the training data may be sentences in a machine translation corpus.
  • the machine translation corpus mentioned here can be a multilingual parallel corpus, for example, a bilingual parallel (Chinese and English) machine translation corpus.
  • the training data may include a Chinese sentence and an English translation corresponding to the Chinese sentence.
  • the training data may also include language type indication information.
  • the language type indication information may be Chinese type indication information, English type indication information, French type indication information, or the like.
  • the Chinese sentence when the training data includes a sequence composed of a Chinese sentence and an English translation corresponding to the Chinese sentence, in the sequence, the Chinese sentence may include Chinese type indication information before the Chinese sentence, and the Chinese type indication information may indicate the following
  • the sentence is a Chinese sentence, for example, the Chinese type indication information can be " ⁇ zh>".
  • the English sentence (that is, the English translation corresponding to the Chinese sentence) may include English type indication information in front of it, and the English type indication information may indicate that the following sentence is an English sentence.
  • the English type indication information may be " ⁇ en>".
  • the training data may also include sentences in other languages such as French, German, and Spanish.
  • the training data may include a Chinese sentence, a French translation corresponding to the sentence, and a German translation corresponding to the Chinese sentence.
  • the sentence paraphrase model includes a language type indicator parameter, and the language type indicator parameter is used to indicate the language type of the paraphrase sentence generated by the sentence paraphrase model;
  • the training module is specifically configured to train the sentence paraphrase model according to the training data and the language indicator parameter.
  • the sentence paraphrase model includes language indicator parameters. According to the training data and the language indicator parameters, the sentence paraphrase model is trained, so that the language paraphrase model can (according to the language type (Indicating parameters) to generate multiple paraphrase sentences in different languages.
  • the language indicating parameter is a parameter of the sentence paraphrase model.
  • the language of the paraphrase sentence is determined according to the language indicator parameter
  • the acquisition module is further configured to: acquire language indicator information; and the training module further Used for: determining the language indication parameter according to the language indication information.
  • the foregoing determination of the language indication parameter according to the language indication information can be understood as: setting the language indication parameter according to the language indication information. Therefore, the language type of the paraphrase sentence generated by the sentence paraphrase model can be controlled according to the set language type indicating parameter.
  • the language indication parameter is set by the language indication information, and the language type of the paraphrase sentence generated by the sentence paraphrase model can be flexibly and conveniently controlled according to the language indication parameter.
  • At least one of the multiple sentences included in the training data is a sentence that has undergone disturbance processing
  • the disturbance processing includes randomly deleting words in the sentence, Reverse the word order of the words in the sentence and insert at least one of the words into the sentence randomly.
  • the scrambled training data is used to train the sentence paraphrase model, which can improve the robustness of the sentence paraphrase model, so that the generated paraphrase sentence
  • the semantics of is more accurate and the forms are more diverse.
  • the disturbance processing may also include other disturbance processing performed on the sentence, which is not limited in the embodiment of the present application.
  • the disturbance processing may be denoising auto-encoder (DAE) technology.
  • DAE denoising auto-encoder
  • the acquisition module is further configured to: acquire pre-training data, where the pre-training data includes one or more sentences, and the sentences included in the pre-training data
  • the language type is one or more of the language types of the sentences included in the training data
  • the device further includes a pre-training module for training the sentence paraphrase model according to the pre-training data.
  • using pre-training data including at least one sentence to train the sentence paraphrase model can improve the fluency of the sentence (paraphrase sentence) generated by the sentence paraphrase model.
  • the pre-training data when using the pre-training data to train the sentence paraphrase model, the pre-training data may be used as the input of the sentence paraphrase model, and at the same time, the sentence in the pre-training data may be used as the input of the sentence paraphrase model.
  • the label (that is, the true value) uses a backpropagation algorithm to continuously obtain pre-training data for iterative training until the loss function converges, and then completes the training of the sentence paraphrase model.
  • a sentence paraphrase device including: an acquisition module for acquiring an input sentence; a paraphrase module for paraphrasing the input sentence through a sentence paraphrase model to generate a paraphrase sentence of the input sentence,
  • the language type of the paraphrase sentence is the same as or different from the language type of the input sentence;
  • the sentence paraphrase model is obtained after training using training data, the training data includes a plurality of sentences, and the languages of the plurality of sentences are different , And the multiple sentences have the same meaning.
  • the training data includes multiple sentences in different languages and with the same meaning
  • the training sentence paraphrase model is directly obtained after training using the training data, and does not rely on the paraphrase sentence to the corpus to the sentence paraphrase
  • the training of the model can reduce the cost of training the sentence paraphrase model, so that the paraphrase sentence can be obtained conveniently.
  • the sentence paraphrase model may be a language model.
  • the sentence paraphrase model may be a Transformer language model, a CNN language model, or an RNN language model, etc., or the sentence paraphrase model may also be another deep learning language model, which is not limited in this embodiment of the application.
  • the training data may include a sentence and one or more translations in different languages corresponding to the sentence.
  • the multiple sentences included in the training data may be sentences in a machine translation corpus.
  • the machine translation corpus mentioned here can be a multilingual parallel corpus, for example, a bilingual parallel (Chinese and English) machine translation corpus.
  • the training data may include a Chinese sentence and an English translation corresponding to the Chinese sentence.
  • the training data may also include language type indication information.
  • the language type indication information may be Chinese type indication information, English type indication information, French type indication information, or the like.
  • the Chinese sentence when the training data includes a sequence composed of a Chinese sentence and an English translation corresponding to the Chinese sentence, in the sequence, the Chinese sentence may include Chinese type indication information before the Chinese sentence, and the Chinese type indication information may indicate the following
  • the sentence is a Chinese sentence, for example, the Chinese type indication information can be " ⁇ zh>".
  • the English sentence (that is, the English translation corresponding to the Chinese sentence) may include English type indication information in front of it, and the English type indication information may indicate that the following sentence is an English sentence.
  • the English type indication information may be " ⁇ en>".
  • the training data may also include sentences in other languages such as French, German, and Spanish.
  • the training data may include a Chinese sentence, a French translation corresponding to the sentence, and a German translation corresponding to the Chinese sentence.
  • the sentence paraphrase model includes a language type indicator parameter, and the language type indicator parameter is used to indicate the language type of the paraphrase sentence generated by the sentence paraphrase model.
  • the sentence paraphrase model is obtained after training according to the training data and the language indicator parameter.
  • the sentence paraphrase model includes language indicator parameters. According to the training data and the language indicator parameters, the sentence paraphrase model is trained, so that the language paraphrase model can (according to the language type (Indicating parameters) to generate multiple paraphrase sentences in different languages.
  • the language indicating parameter is a parameter of the sentence paraphrase model.
  • the language of the paraphrase sentence is determined according to the language indication parameter, and the language indication parameter is determined according to the acquired language indication information.
  • the foregoing determination of the language indication parameter according to the language indication information can be understood as: setting the language indication parameter according to the language indication information. Therefore, the language type of the paraphrase sentence generated by the sentence paraphrase model can be controlled according to the set language type indicating parameter.
  • the language indication parameter is set by the language indication information, and the language type of the paraphrase sentence generated by the sentence paraphrase model can be flexibly and conveniently controlled according to the language indication parameter.
  • At least one of the multiple sentences included in the training data is a sentence that has undergone disturbance processing
  • the disturbance processing includes randomly deleting words in the sentence and randomly Reverse the word order of the words in the sentence and insert at least one of the words into the sentence randomly.
  • the scrambled training data is used to train the sentence paraphrase model, which can improve the robustness of the sentence paraphrase model, so that the generated paraphrase sentence
  • the semantics of is more accurate and the forms are more diverse.
  • the disturbance processing may also include other disturbance processing performed on the sentence, which is not limited in the embodiment of the present application.
  • the disturbance processing may be denoising auto-encoder (DAE) technology.
  • DAE denoising auto-encoder
  • the sentence paraphrase model is obtained after training using pre-training data and then training using the training data, and the pre-training data includes one or more Sentence, the language of the sentence included in the pre-training data is one or more of the language of the sentence included in the training data.
  • using pre-training data including at least one sentence to train the sentence paraphrase model can improve the fluency of the sentence (paraphrase sentence) generated by the sentence paraphrase model.
  • the pre-training data when using the pre-training data to train the sentence paraphrase model, the pre-training data may be used as the input of the sentence paraphrase model, and at the same time, the sentence in the pre-training data may be used as the input of the sentence paraphrase model.
  • the label (that is, the true value) uses a backpropagation algorithm to continuously obtain pre-training data for iterative training until the loss function converges, and then completes the training of the sentence paraphrase model.
  • a device for training a sentence paraphrase model includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the foregoing first aspect implementation manners.
  • a sentence repetition device which includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to execute the method in any one of the foregoing second aspects.
  • the processors in the fifth and sixth aspects mentioned above can be either a central processing unit (CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor can include graphics processing Graphical processing unit (GPU), neural-network processing unit (NPU), tensor processing unit (TPU), and so on.
  • the neural network computing processor can include graphics processing Graphical processing unit (GPU), neural-network processing unit (NPU), tensor processing unit (TPU), and so on.
  • GPU graphics processing Graphical processing unit
  • NPU neural-network processing unit
  • TPU tensor processing unit
  • TPU is an artificial intelligence accelerator application specific integrated circuit fully customized by Google for machine learning.
  • a computer-readable medium stores program code for device execution, and the program code includes a method for executing any one of the first aspect or the second aspect. .
  • a computer program product containing instructions is provided.
  • the computer program product runs on a computer, the computer executes the method in any one of the foregoing first aspect or second aspect.
  • a chip in a ninth aspect, includes a processor and a data interface, the processor reads instructions stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above The method in the implementation mode.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • the aforementioned chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • an electronic device in a tenth aspect, includes the device of the training sentence retelling device in any one of the foregoing third aspects, or the electronic device includes the device in any one of the foregoing fourth aspects Sentence retelling device.
  • the electronic device may specifically be a server.
  • the electronic device may specifically be a terminal device.
  • the training data includes multiple sentences with different language types and the same meaning.
  • the training data is used to train the sentence paraphrase model directly, instead of relying on the paraphrase sentence to train the sentence paraphrase model on the corpus.
  • the cost of training the sentence paraphrase model is reduced, so that the paraphrase sentence can be easily obtained.
  • FIG. 1 is a schematic diagram of an application scenario of natural language processing provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of another application scenario of natural language processing provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a natural language processing related device provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a method for training a sentence paraphrase model provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for training a sentence paraphrase model provided by another embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a training sentence paraphrase model provided by an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a training sentence paraphrase model provided by another embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a sentence retelling method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the hardware structure of a sentence repetition device according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of the hardware structure of an apparatus for training sentence paraphrase models according to an embodiment of the present application.
  • Figure 1 shows a natural language processing system that includes user equipment and data processing equipment.
  • user equipment includes smart terminals such as mobile phones, personal computers, or information processing centers.
  • the user equipment is the initiator of natural language data processing, and as the initiator of requests such as language question and answer or query, usually the user initiates the request through the user equipment.
  • the above-mentioned data processing device may be a device or server with data processing functions such as a cloud server, a network server, an application server, and a management server.
  • the data processing equipment receives the query sentence/voice/text question sentence from the smart terminal through the interactive interface, and then performs machine learning, deep learning, search, reasoning, decision-making and other language through the memory of the data storage and the processor of the data processing. data processing.
  • the memory in the data processing device can be a general term, including a database for local storage and storing historical data.
  • the database can be on the data processing device or on other network servers.
  • the user device may receive an instruction from the user to request to paraphrase an input sentence (for example, the input sentence may be a sentence input by the user) to obtain a paraphrase sentence (for example, the paraphrase sentence It can be a different expression obtained by retelling and having the same semantics as the input sentence), and then the input sentence is sent to the data processing device, so that the data processing device retells the input sentence to obtain the paraphrase sentence.
  • an input sentence for example, the input sentence may be a sentence input by the user
  • a paraphrase sentence for example, the paraphrase sentence It can be a different expression obtained by retelling and having the same semantics as the input sentence
  • the data processing device can execute the sentence retelling method of the embodiment of the present application.
  • Paraphrase refers to different expressions of the same semantics of the input sentence.
  • the input sentence is "how far is the distance from the sun to the earth”. Retelling the input sentence can get “how far is the sun from the earth” and “from “How many kilometers is the earth to the sun”, “how many kilometers is the earth from the sun”, “how far is the earth from the sun”, “what is the distance between the earth and the sun” and other paraphrase sentences, these paraphrase sentences all express the same or similar semantics as the input sentence , Which is the distance between the sun and the earth, so these sentences can be called paraphrase sentences of input sentences.
  • paraphrases may include different levels of paraphrases such as vocabulary level, phrase level, and sentence level. That is, both the input sentence and the paraphrase sentence may be vocabulary, phrase or sentence, which is not limited in the embodiment of the present application.
  • vocabulary-level paraphrases are commonly referred to as synonyms.
  • vocabulary-level paraphrases can include: “tomato” and “tomato”, “car” and “vehicle”.
  • phrase-level retelling can include: "Peking University” and “Beijing University”, “consider” and “take...into consideration”.
  • sentence-level retelling can include: "How tall is Yao Ming?" and “How tall is Yao Ming?", "Messi plays for FC Barcelona in the Spanish Primera League.” and "Messi is a player of Barca in La Liga” .”
  • the language type (or language) of the input sentence and its paraphrase sentence is not limited in the embodiments of the present application.
  • the input sentence and the paraphrase sentence can be in various languages such as Chinese, English, German, French, etc., which are not limited in the embodiments of the present application.
  • the input sentence and the paraphrase sentence may be Chinese; or, the input sentence and the paraphrase sentence may be English, which is not limited in the embodiment of the present application.
  • Figure 2 shows another natural language processing system.
  • the user equipment is directly used as a data processing device.
  • the user equipment can directly receive input from the user and process it directly by the hardware of the user equipment itself.
  • Figure 1 is similar, and you can refer to the above description, which will not be repeated here.
  • the user equipment can receive a user's instruction, and the user equipment itself can paraphrase the input sentence to obtain the paraphrase sentence.
  • the user equipment itself can execute the sentence retelling method of the embodiment of the present application.
  • Fig. 3 is a schematic diagram of a natural language processing related device provided by an embodiment of the present application.
  • the user equipment in FIG. 1 and FIG. 2 may specifically be the local device 301 or the local device 302 in FIG. 3, and the data processing device in FIG. 1 may specifically be the execution device 210 in FIG. 3, where the data storage system 250 may be To store the to-be-processed data of the execution device 210, the data storage system 250 may be integrated on the execution device 210, or may be set on the cloud or other network servers.
  • the data processing equipment in Figures 1 and 2 can perform data training/machine learning/deep learning through neural network models or other models (for example, support vector machine-based models), and use the data to finally train or learn the model to input
  • the sentence is paraphrased to obtain the paraphrase.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression: among them, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to a part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • Recurrent Neural Networks are used to process sequence data.
  • the layers are fully connected, and the nodes in each layer are disconnected.
  • this ordinary neural network has solved many problems, it is still incapable of many problems. For example, if you want to predict what the next word of a sentence is, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the backpropagation (BP) algorithm is also used, but there is a difference.
  • This learning algorithm is called a time-based backpropagation algorithm.
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • the neural network can use an error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal until the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • an embodiment of the present application provides a system architecture 100.
  • the data collection device 160 is used to collect training data.
  • the training data includes multiple sentences in different languages.
  • the training data may include a sequence composed of sentences in the machine translation corpus and their corresponding translations, that is, a sequence composed of multiple sentences of the same semantics but in different languages.
  • the spliced sequence may also be referred to as a training sentence.
  • the splicing here refers to the two sentences being arranged in sequence.
  • the Chinese sentence may be located in front of the corresponding English sentence, or the English sentence may be located in front of the corresponding Chinese sentence. Not limited.
  • the data collection device 160 stores the training data in the database 130, and the training device 120 trains to obtain the target model/rule 101 based on the training data maintained in the database 130.
  • the training device 120 processes the training sentence to obtain the paraphrase sentence, and determines the training objective (objective) of the target model/rule 101 according to the paraphrase sentence until The reward of the target model/rule 101 is greater than a certain threshold (and/or less than a certain threshold), thereby completing the training of the target model/rule 101.
  • the above-mentioned target model/rule 101 can be used to implement the sentence retelling method of the embodiment of the present application, that is, the training sentence is input into the target model/rule after relevant preprocessing (the preprocessing module 113 and/or the preprocessing module 114 may be used for processing) 101, you can get the paraphrase sentence.
  • the target model/rule 101 in the embodiment of the present application may specifically be (multiple) neural networks.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rule 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to this application. Limitations of the embodiment.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 4, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR)/virtual reality (VR), in-vehicle terminals, etc., can also be servers or clouds.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data in this embodiment of the application may include: training sentences input by the client device.
  • the preprocessing module 113 and the preprocessing module 114 are used to preprocess the input data (such as training sentences) received by the I/O interface 112 (specifically, the training sentences can be processed to obtain word vectors). However, there may be no preprocessing module 113 and preprocessing module 114 (or only one of the preprocessing modules), and the calculation module 111 is directly used to process the input data.
  • the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing .
  • the data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
  • the I/O interface 112 feeds back the processing result, for example, a paraphrase sentence, to the client device 140.
  • the training device 120 can generate a target model/rule 101 corresponding to the downstream system for different downstream systems, and the corresponding target model/rule 101 can be used to achieve the above goals or complete the above tasks, thereby providing users Provide the desired result.
  • the user can manually set input data (for example, input a paragraph of text), and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data (for example, input a paragraph of text) to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can log in to the client device Set the corresponding permissions in 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action (for example, the output result may be a paraphrase).
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • the target model/rule 101 is obtained through training according to the training device 120.
  • the target model/rule 101 may be the sentence paraphrase model in the embodiment of the present application.
  • the sentence paraphrase model in the embodiment of the present application may include At least one neural network, and the at least one neural network may include a CNN, a deep convolutional neural network (DCNN), a recurrent neural network (RNN), a Transformer language model, and so on.
  • FIG. 5 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the application.
  • the chip includes a neural network processor (neural processing unit, NPU) 50.
  • the chip can be set in the execution device 110 as shown in FIG. 4 to complete the calculation work of the calculation module 111.
  • the chip can also be set in the training device 120 as shown in FIG. 4 to complete the training work of the training device 120 and output the target model/rule 101.
  • the sentence paraphrase model (language model) in the embodiment of the present application can be implemented in the chip as shown in FIG. 5.
  • the sentence paraphrase method of the embodiment of the present application can be specifically executed in the arithmetic circuit 503 and/or the vector calculation unit 507 in the NPU 50 to obtain the paraphrase sentence.
  • the NPU 50 can be mounted on the host CPU as a coprocessor, and the host CPU distributes tasks.
  • the core part of the NPU 50 is the arithmetic circuit 503.
  • the controller 504 in the NPU 50 can control the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 503 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 508.
  • the vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-fully connected layers (FC) layers of the neural network, such as pooling, batch normalization, and partial response. Normalization (local response normalization), etc.
  • FC non-convolutional/non-fully connected layers
  • the vector calculation unit 507 can store the processed output vector to the unified buffer 506.
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 507 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.
  • the unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.
  • An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;
  • the controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502, and the fetch memory 509 may all be on-chip memories.
  • the external memory of the NPU may be a memory external to the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM), or Other readable and writable memory.
  • the sentence retelling method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • the sentence retelling method of the embodiment of the present application can be executed by the data processing device in FIG. 1, the user equipment in FIG. 2, the execution device 210 in FIG. 3, and the execution device 110 in FIG. 4, and the execution device in FIG. 110 may include the chip shown in FIG. 5.
  • the statement retelling method provided in the embodiment of the present application can be executed on a server, can also be executed on the cloud, and can also be executed on a terminal device.
  • a terminal device as an example, as shown in FIG. 6, the technical solution of the embodiment of the present application can be applied to the terminal device, and the sentence retelling method in the embodiment of the present application can retell an input sentence to obtain a retelling sentence of the input sentence.
  • the terminal device may be mobile or fixed.
  • the terminal device may be a mobile phone with a natural language processing function, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer, LC), personal digital assistant (PDA), personal computer (PC), camera, video camera, smart watch, wearable device (WD) or self-driving vehicle, etc., this application is implemented The example does not limit this.
  • TPC tablet personal computer
  • PDA personal digital assistant
  • PC personal computer
  • camera video camera
  • smart watch wearable device
  • WD wearable device
  • self-driving vehicle etc.
  • Paraphrase refers to different expressions of the same semantics of a sentence. Paraphrase is very common in natural language. In the field of natural language processing (NLP), paraphrase has also been used more and more widely. For example, paraphrase can be applied to the following multiple fields.
  • NLP natural language processing
  • paraphrase technology can be used to synonymously rewrite the sentence to be translated to generate sentences that are easier to translate.
  • paraphrase flexible and non-standard spoken language into standardized sentences, so as to achieve better translation results; for another example, paraphrase technology can also alleviate the problem of sparse data in machine translation systems, that is, generate training corpus to increase translation through paraphrase;
  • paraphrase technology is also used to improve the evaluation of machine translation.
  • the questions submitted by users to the Q&A system can be rewritten online, and then all submitted to the Q&A system to recall the results; or, part of the text content in the knowledge base can also be paraphrased and expanded and added to the knowledge base.
  • Retelling technology can automatically generate a large number of extraction templates for the extraction system, thereby improving the performance of the extraction system.
  • paraphrase technology can be used to rewrite and expand query terms, thereby optimizing the quality of information retrieval.
  • paraphrase technology can be used to calculate the similarity of sentences, so as to better perform sentence clustering and selection; secondly, similar to the application in machine translation, paraphrase technology can improve the evaluation of automatic summarization.
  • sentence retelling method in the embodiment of the present application can be applied in all the above-mentioned fields.
  • paraphrase In natural language processing, paraphrase mainly includes two types of tasks, the quality of paraphrase (sentence) and the diversity of paraphrase (sentence).
  • the quality of the paraphrase refers to whether the generated paraphrase is fluent and consistent with the input sentence. For example, if the input sentence is "what is the distance from the sun to the earth”, the generated paraphrase sentence is "how far is the earth from the sun”. If the paraphrase sentence is smooth and synonymous with the input sentence, it is a high-quality paraphrase generate. If the generated paraphrase sentence is "How far is the distance from the sun on the earth”, the paraphrase sentence is not smooth. If the generated paraphrase sentence is "What is the distance from the moon to Mars", the semantics of the paraphrase and the input sentence are irrelevant , These two paraphrase sentences are low-quality paraphrase sentences.
  • the diversity of paraphrases refers to whether the generated paraphrases are diverse and have the amount of information (Informative).
  • the input sentence is "what is the distance between the sun and the earth”
  • the generated multiple repetition sentences are “how far is the earth from the sun”
  • the plurality of paraphrase sentences are synonymous with the input sentence, but they are all different expressions from the input sentence, the diversity of the plurality of paraphrase sentences is better.
  • the embodiments of the present application propose a sentence paraphrase method and a method for training a sentence paraphrase model, which can easily obtain a paraphrase sentence.
  • FIG. 7 is a schematic flowchart of a method 700 for training a sentence paraphrase model provided by an embodiment of the present application.
  • the method 700 shown in FIG. 7 may be executed by the terminal device in FIG. 6.
  • the sentence paraphrase model can be used to paraphrase the input sentence (or training sentence) to obtain the paraphrase sentence of the input sentence.
  • the method shown in FIG. 7 may include step 710 and step 720, which are respectively described in detail below.
  • the training data may include multiple sentences, the multiple sentences are in different languages, and the multiple sentences have the same meaning.
  • the training data may include two sentences, which are the Chinese sentence "What is the distance between the sun and the earth” and the English sentence "What is the distance between the sun and the earth”.
  • the language of the above two sentences is different, and both of these sentences have the same meaning.
  • the above-mentioned English sentence is an English translation of the above-mentioned Chinese sentence (or it can also be said that the above-mentioned Chinese sentence is a Chinese translation of the above-mentioned English sentence).
  • the number of sentences and the language types of sentences included in the training data are not limited.
  • the training data may include a sentence and one or more translations in different languages corresponding to the sentence.
  • the training data may include a Chinese sentence and an English translation corresponding to the Chinese sentence.
  • the training data may be a sequence composed of multiple sentences in different languages, where the multiple sentences may have the same meaning.
  • the training data may include a sequence composed of sentences in the machine translation corpus and their corresponding translations, that is, a sequence composed of multiple sentences of the same semantics but in different languages.
  • the training data in this embodiment of the application may refer to: a Chinese sentence and its corresponding English sentence (the English translation of the Chinese sentence) in the machine translation corpus are spliced together sequence.
  • the splicing here refers to the two sentences being arranged in sequence.
  • the Chinese sentence may be located in front of the corresponding English sentence, or the English sentence may be located in front of the corresponding Chinese sentence. Not limited.
  • the training data may also include language type indication information.
  • the language type indication information may be Chinese type indication information, English type indication information, French type indication information, or the like.
  • the Chinese sentence when the training data includes a sequence composed of a Chinese sentence and an English translation corresponding to the Chinese sentence, in the sequence, the Chinese sentence may include Chinese type indication information before the Chinese sentence, and the Chinese type indication information may indicate the following
  • the sentence is a Chinese sentence, for example, the Chinese type indication information can be " ⁇ zh>".
  • the English sentence (that is, the English translation corresponding to the Chinese sentence) may include English type indication information in front of it, and the English type indication information may indicate that the following sentence is an English sentence.
  • the English type indication information may be " ⁇ en>".
  • the training data may also include sentences in other languages such as French, German, and Spanish.
  • the training data may include a Chinese sentence, a French translation corresponding to the sentence, and a German translation corresponding to the Chinese sentence.
  • At least one of the multiple sentences included in the training data may be a sentence that has undergone disturbance processing.
  • the disturbance processing may include at least one of randomly deleting words in the sentence, randomly changing the word order of the words in the sentence, and randomly inserting words into the sentence.
  • the scrambled training data is used to train the sentence paraphrase model, which can improve the robustness of the sentence paraphrase model, so that the generated paraphrase sentence
  • the semantics of is more accurate and the forms are more diverse.
  • the disturbance processing may also include other disturbance processing performed on the sentence, which is not limited in the embodiment of the present application.
  • the disturbance processing may be denoising auto-encoder (DAE) technology.
  • DAE denoising auto-encoder
  • the method 700 may further include S701.
  • S701 can be called a pre-training process, which is specifically as follows:
  • S701 Obtain pre-training data, and train the sentence paraphrase model according to the pre-training data.
  • the pre-training data may include one or more sentences, and the language of the sentences included in the pre-training data is one or more of the language types of the sentences included in the training data.
  • the pre-training data may include one sentence, and the sentence is a Chinese sentence or an English sentence; or the pre-training data may include multiple sentences, and the multiple The sentences include Chinese sentences and/or English sentences.
  • pre-training data including one sentence may be used as the input of the sentence paraphrase model, and at the same time, the sentence in the pre-training data may be used as the input label (That is, the true value), the backpropagation algorithm is used to continuously obtain pre-training data for iterative training until the loss function converges, and then the training of the sentence paraphrase model is completed.
  • pre-training data can be used as the input of the sentence paraphrase model
  • the sentence in the pre-training data can be used as the input label to train the sentence paraphrase model so that the The sentence paraphrase model can output fluent sentences.
  • the multiple sentences in the pre-training data may be in multiple different languages, which is not limited in the embodiment of the present application.
  • the sentences in the pre-training data may be sentences in a non-parallel corpus, and the non-parallel corpus may include sentences in multiple different languages.
  • training the sentence paraphrase model by using pre-training data including sentences in different languages can enable the sentence paraphrase to output fluent sentences in this language (that is, the language of the sentence included in the pre-training data).
  • using pre-training data including at least one sentence to train the sentence paraphrase model can improve the fluency of the sentence (paraphrase sentence) generated by the sentence paraphrase model.
  • the sentence paraphrase model may be a language model.
  • the sentence paraphrase model may be a Transformer language model, a CNN language model, or an RNN language model, etc., or the sentence paraphrase model may also be another deep learning language model, which is not limited in this embodiment of the application.
  • the CNN language model is limited by the fixed size of the filter, and can only model partial inter-word relationships;
  • the RNN language model is limited by the iterative calculation of cycles, and it is difficult to calculate multiple nodes in parallel.
  • the Transformer language model can overcome the weakness that the CNN language model can only establish local connections between words, and it can also overcome the weakness that the RNN language model is difficult to calculate in parallel on the GPU and establish two-way connections. Therefore, the overall efficiency of the sentence paraphrase model can be improved.
  • the (trained) sentence paraphrase model can be used to generate a paraphrase sentence of the input sentence based on the input sentence.
  • the paraphrase sentence and the input sentence may have the same meaning, and the language type of the paraphrase sentence and the language type of the input sentence may be the same; or, the language type of the paraphrase sentence and the language type of the input sentence also It can be different.
  • the paraphrase sentence of the input sentence obtained by the sentence paraphrase model may be a French sentence, but the Chinese sentence has the same meaning as the French sentence (that is, the Chinese sentence and the French sentence Sentences are different expressions of the same semantics in different languages).
  • the language type of the paraphrase sentence may be determined according to the language type indication parameter.
  • training data including multiple sentences may be used as the input of the sentence paraphrase model, and at the same time, multiple sentences in the training data may be used as the input label (I.e. true value), the back propagation algorithm is used to continuously obtain training data for iterative training until the loss function converges, then the training of the sentence paraphrase model is completed.
  • the training process in S720 is similar to the training process in S701 above, and the specific training process may be as described in FIG. 9 below.
  • the sentence paraphrase model may also include a language type indicator parameter.
  • the language type indicating parameter may be used to indicate the language type of the paraphrase sentence generated by the sentence paraphrase model.
  • the sentence paraphrase model includes language indicator parameters. According to the training data and the language indicator parameters, the sentence paraphrase model is trained, so that the language paraphrase model can (according to the language type (Indicating parameters) to generate multiple paraphrase sentences in different languages.
  • the language indicating parameter is a parameter of the sentence paraphrase model.
  • the Transformer language model in this embodiment of the application differs from the Transformer language model in the prior art in that the Transformer language model in the embodiment of the application includes language indications parameter.
  • the method 700 may further include: obtaining language indication information.
  • the language type indication information may be used to determine the language type of the paraphrase sentence generated by the sentence paraphrase model.
  • training a sentence paraphrase model based on the training data may include:
  • the language indication parameter is set by the language indication information, and the language type of the paraphrase sentence generated by the sentence paraphrase model can be flexibly and conveniently controlled according to the language indication parameter.
  • the foregoing determination of the language indication parameter according to the language indication information may refer to: setting the language indication parameter according to the language indication information. Therefore, the language type of the paraphrase sentence generated by the sentence paraphrase model can be controlled according to the set language type indicating parameter.
  • the training data includes multiple sentences with different language types and the same meaning.
  • the training data is used to train the sentence paraphrase model directly, instead of relying on the paraphrase sentence to train the sentence paraphrase model on the corpus.
  • the cost of training the sentence paraphrase model is reduced, so that the paraphrase sentence can be easily obtained.
  • FIG. 8 is a schematic flowchart of a method 800 for training a sentence paraphrase model provided by an embodiment of the present application.
  • the method 800 shown in FIG. 8 may be executed by the terminal device in FIG. 6.
  • the sentence paraphrase model can be used to paraphrase the input sentence (or training sentence) to obtain the paraphrase sentence of the input sentence.
  • the method 800 shown in FIG. 8 may include step 810, step 820, and step 830, and these steps are respectively described in detail below.
  • multilingual non-parallel corpus can be used to pre-train the sentence paraphrase model, where S i can be a sentence in any language, and n and i are both positive integers.
  • S i can be sentences in various languages, for example, the Chinese sentence "what is the distance between the sun and the earth", or S i can also be the English sentence "What is the distance between the sun and the earth” , examples of the present application language S i embodiment is not limited.
  • the sentence paraphrase model may be a Transformer language model.
  • the Transformer language model in the embodiment of the present application has language indication parameters, which is different from the Transformer language model in the prior art.
  • the input sentence as a Chinese sentence as an example
  • the input sentence "what is the distance from the sun to the earth” can be input into the sentence paraphrase model, and multiple word vectors of the input sentence can be obtained after processing.
  • the vector E of the input sentence can be obtained.
  • the vector E of the input sentence is calculated through a sentence paraphrase model (such as the multilayer Transformer module connected by the attention mechanism in Figure 9), and a set of context-related hidden state vectors (of the input sentence) can be obtained.
  • a sentence paraphrase model such as the multilayer Transformer module connected by the attention mechanism in Figure 9
  • a hidden state vector (hidden state) H can be obtained, and the generated word at the current time can be predicted based on the hidden state vector H at each time.
  • the sentence paraphrase model can be expressed as: Where ⁇ is a parameter to be learned in the sentence paraphrase model, h zh is a language indicating parameter, x is an input sentence, y t is an input sentence, and t is a positive integer.
  • the language indicator parameter may include multiple language indicator parameters of different languages.
  • the language indicator parameter may include a Chinese indicator parameter h zh , an English indicator parameter h en , a German indicator parameter h de or a Spanish indicator parameter One or more of h es, etc.
  • the sentence paraphrase model can generate fluent natural language sentences according to the specified language, but cannot generate the paraphrase of the input sentence.
  • the pre-training process in S810 may use a back propagation algorithm to continuously obtain pre-training data for iterative training until the loss function converges, and then complete the pre-training of the sentence paraphrase model.
  • S820 Use the parallel corpus of machine translation to train the sentence paraphrase model.
  • parallel corpus of machine translation can be used Training the model repeat statements, wherein, the statement may be input Y i X i of the translation sentence, m, i are positive integers.
  • X i can be a variety of different languages statements, for example, X i can "from Earth to the sun is the number of" for the Chinese statement, or, X i can be in English sentence "What is the distance between the sun and the earth ", the present application example of the language of the embodiment is not limited to X i.
  • the present application example of the embodiment of the language is not limited in Y i, Y i i.e. the statement may be arbitrary languages X i of the input sentence translation.
  • the input sentence corresponding to X i and Y i spliced into a sequence of translations.
  • Chinese type indication information ⁇ zh> may be used to indicate that the subsequent input sentence is Chinese
  • English type indication information ⁇ en> may be used to indicate that the subsequent input sentence is English.
  • the language indicator parameter h zh can control the corresponding input sentence (or word) to be Chinese
  • the language indicator parameter h en can control the corresponding input sentence (or word) to be English.
  • prior to use of the model statement to repeat the input sentence corresponding to X i and Y i stitched translation process sequence can also statement corresponding to X i and Y i input in translation at least one of Disturbance treatment.
  • the disturbance processing may include at least one of randomly deleting words in the sentence, randomly changing the word order of the sentence, and randomly inserting words into the sentence.
  • the disturbance processing may also include other disturbance processing performed on the sentence, which is not limited in the embodiment of the present application.
  • the noise reduction technique from the encoder for processing at least one disturbance input sentence corresponding to X i and Y i of the translation.
  • the training process in S820 is similar to the pre-training process in S810.
  • the back-propagation algorithm is used to continuously obtain training data for iterative training until the loss function converges, and then the training of the sentence paraphrase model is completed.
  • the language types of the paraphrase sentences generated by the sentence paraphrase model can be controlled.
  • the sentence paraphrase model After the sentence paraphrase model undergoes the above-mentioned training process, it can enter the testing phase, that is, it can be used to generate actual paraphrase sentences.
  • the sentence paraphrase model can be used to generate one or more paraphrase sentences.
  • the sentence paraphrase model is trained without relying on the paraphrase sentence pair corpus, but directly uses the above training data to train the sentence paraphrase model. Therefore, the cost of training the sentence paraphrase model can be reduced, thereby Can easily obtain paraphrase sentences.
  • FIG. 11 shows a schematic flowchart of a sentence retelling method 1100 provided by an embodiment of the present application.
  • the method shown in FIG. 11 may be executed by the terminal device in FIG. 6.
  • the method shown in FIG. 11 includes step 1110 and step 1120, and these steps are respectively described in detail below.
  • the input sentence may be a vocabulary, phrase or sentence, and at the same time, the training sentence may also be in various languages, which is not limited in the embodiment of the present application.
  • S1120 Paraphrase the input sentence through a sentence paraphrase model to generate a paraphrase sentence of the input sentence.
  • the paraphrase sentence and the input sentence may have the same meaning, and the language type of the paraphrase sentence and the language type of the input sentence may be the same; or, the language type of the paraphrase sentence and the language type of the input sentence may also be different .
  • the language type of the paraphrase sentence may be determined according to the language type indication parameter.
  • the paraphrase sentence of the input sentence may be a vocabulary, phrase, or sentence.
  • the paraphrase sentence may also be in various languages, which is not limited in the embodiment of the present application.
  • the sentence paraphrase model may be obtained after training using training data, and the training data may include multiple sentences, the multiple sentences are of different language types, and the multiple sentences have the same meaning.
  • the sentence paraphrase model includes a language type indicator parameter for indicating the language type of the paraphrase sentence generated by the sentence paraphrase model, wherein the sentence paraphrase model is based on the training data and the language type.
  • the predicate type indicates the parameter is obtained after training.
  • the sentence paraphrase model includes language indicator parameters. According to the training data and the language indicator parameters, the sentence paraphrase model is trained, so that the language paraphrase model can (according to the language type (Indicating parameters) to generate multiple paraphrase sentences in different languages.
  • the language indication parameter is determined according to the acquired language indication information, and the language indication information is used to determine the language indication parameter.
  • the language indication parameter is set by the language indication information, and the language type of the paraphrase sentence generated by the sentence paraphrase model can be flexibly and conveniently controlled according to the language indication parameter.
  • At least one of the multiple sentences included in the training data is a sentence that has undergone disturbance processing, and the disturbance processing includes randomly deleting words in the sentence, randomly replacing the word order of the words in the sentence, and randomly inserting into the sentence At least one of the words.
  • the scrambled training data is used to train the sentence paraphrase model, which can improve the robustness of the sentence paraphrase model, so that the generated paraphrase sentence
  • the semantics of is more accurate and the forms are more diverse.
  • the sentence paraphrase model is obtained after training using pre-training data and then training using the training data
  • the pre-training data includes one or more sentences
  • the language types of the sentences included in the pre-training data One or more of the language types of sentences included in the training data.
  • using pre-training data including at least one sentence to train the sentence paraphrase model can improve the fluency of the sentence (paraphrase sentence) generated by the sentence paraphrase model.
  • sentence paraphrase model in FIG. 11 may be obtained after training using the method of training the sentence paraphrase model shown in FIG. 7 or FIG. 8.
  • the training data includes multiple sentences in different languages and with the same meaning
  • the training sentence paraphrase model is directly obtained after training using the training data, and does not rely on the paraphrase sentence to the corpus to the sentence paraphrase
  • the training of the model can reduce the cost of training the sentence paraphrase model, so that the paraphrase sentence can be obtained conveniently.
  • FIG. 12 is a schematic diagram of the hardware structure of a sentence repetition device in an embodiment of the present application.
  • the sentence repeating device 4000 shown in FIG. 12 includes a memory 4001, a processor 4002, a communication interface 4003, and a bus 4004. Among them, the memory 4001, the processor 4002, and the communication interface 4003 implement communication connections between each other through the bus 4004.
  • the sentence retelling device 4000 may include more or fewer modules or units, which is not limited in the embodiment of the present application.
  • the memory 4001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 4001 may store a program. When the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to execute each step of the sentence repetition method in the embodiment of the present application.
  • the processor 4002 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the sentence repetition device of the embodiment of the present application, or to execute the sentence repetition method of the method embodiment of the present application.
  • the processor 4002 may also be an integrated circuit chip with signal processing capabilities.
  • each step of the sentence repetition method of the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 4002 or instructions in the form of software.
  • the aforementioned processor 4002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an ASIC, a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic Devices, discrete hardware components.
  • the aforementioned general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001, and combines its hardware to complete the functions required by the units included in the sentence repetition device of the embodiment of the present application, or execute the sentence repetition of the method embodiment of the present application method.
  • the communication interface 4003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 4000 and other devices or a communication network. For example, the input sentence can be obtained through the communication interface 4003.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 4000 and other devices or a communication network.
  • the input sentence can be obtained through the communication interface 4003.
  • the bus 4004 may include a path for transferring information between various components of the device 4000 (for example, the memory 4001, the processor 4002, and the communication interface 4003).
  • FIG. 13 is a schematic diagram of the hardware structure of an apparatus 5000 for training sentence retelling models according to an embodiment of the present application. Similar to the above device 4000, the device 5000 for training sentence paraphrase models shown in FIG. 13 includes a memory 5001, a processor 5002, a communication interface 5003, and a bus 5004. Among them, the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.
  • the device shown in FIG. 13 is only an example and not a limitation.
  • the device 5000 for training sentence retelling models may include more or fewer modules or units, which is not limited in the embodiment of the present application.
  • the memory 5001 may store a program.
  • the processor 5002 is configured to execute each step of the method for training a sentence paraphrase model in the embodiment of the present application.
  • the processor 5002 may adopt a general CPU, a microprocessor, an ASIC, a GPU or one or more integrated circuits to execute related programs to implement the method for training sentence paraphrase models in the embodiments of the present application.
  • the processor 5002 may also be an integrated circuit chip with signal processing capability.
  • each step of the method of training sentence paraphrase model in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 5002 or instructions in the form of software.
  • the sentence paraphrase model is trained by the device 5000 for training sentence paraphrase models shown in FIG. 13, and the sentence paraphrase model obtained by training can be used to execute the sentence paraphrase method of the embodiment of the present application. Specifically, training the sentence paraphrase model by the device 5000 can obtain the sentence paraphrase model in the method shown in FIG. 11.
  • the device shown in FIG. 13 can obtain training data and the sentence paraphrase model to be trained from the outside through the communication interface 5003, and then the processor trains the sentence paraphrase model to be trained according to the training data.
  • device 4000 and device 5000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 4000 and device 5000 may also include those necessary for normal operation. Other devices.
  • the device 4000 and the device 5000 may also include hardware devices that implement other additional functions.
  • the device 4000 and the device 5000 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 12 and FIG. 13.
  • the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits. (application specific integrated circuit, ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • Access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory Take memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).
  • the foregoing embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions or computer programs.
  • the computer instructions or computer programs are loaded or executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium.
  • the semiconductor medium may be a solid state drive.
  • At least one refers to one or more, and “multiple” refers to two or more.
  • the following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • at least one item (a) of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

一种训练语句复述模型的方法、语句复述方法及其装置,涉及人工智能领域中的自然语言处理技术。该训练语句复述模型的方法包括:获取训练数据,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义(710);根据所述训练数据,训练语句复述模型,所述语句复述模型用于基于输入语句生成所述输入语句的复述语句(720)。所述训练语句复述模型的方法,能够便捷地获得复述语句。

Description

训练语句复述模型的方法、语句复述方法及其装置
本申请要求于2019年11月01日提交中国专利局、申请号为201911061874.7、申请名称为“训练语句复述模型的方法、语句复述方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,并且更具体地,涉及训练语句复述模型的方法、语句复述方法及其装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
随着人工智能技术的不断发展,让人机之间能够通过自然语言进行交互的自然语言人机交互系统变的越来越重要。人机之间能够通过自然语言进行交互,就需要系统能够识别出人类自然语言的具体含义。通常,系统通过采用对自然语言的句子进行关键信息提取来识别句子的具体含义。
复述(paraphrase)是指对于语句进行相同语义的不同表达,复述在自然语言中非常普遍,在自然语言处理(natural language processing,NLP)领域里,复述也得到了越来越广泛的应用。因此,如何获得复述语句成为一个亟需解决的技术问题。
发明内容
本申请提供一种训练语句复述模型的方法、语句复述方法及其装置,能够便捷地获得复述语句。
第一方面,提供了一种训练语句复述模型的方法,该方法包括:获取训练数据,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义;根据所述训练数据,训练语句复述模型,所述语句复述模型用于基于输入语句生成所述输入语句的复述语句,所述复述语句与所述输入语句具有相同含义,所述复述语句的语种与所述输入语句的语种相同或不同。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,直接使用该训练数据训练语句复述模型,而并不依赖复述语句对语料对语句复述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
需要说明的是,在实际中,往往需要通过人工校准才能得到高质量的复述语句对语料, 因此,会导致训练语句复述模型的成本较高。
可选地,所述语句复述模型可以为语言模型。
例如,所述语句复述模型可以为Transformer语言模型、CNN语言模型或RNN语言模型等,或者,所述语句复述模型也可以为其他深度学习的语言模型,本申请实施例对此并不限定。
可选地,所述训练数据可以包括一个语句、及该语句对应的不同语种的一个或多个译文。
可选地,所述训练数据包括的多个语句可以为机器翻译语料中的语句。需要说明的是,这里所说的机器翻译语料库可以为多语种平行语料,例如,双语平行(中英文)的机器翻译语料。
例如,所述训练数据可以包括一个中文语句、及该中文语句对应的英文译文。
可选地,所述训练数据还可以包括语种类型指示信息。其中,所述语种类型指示信息可以为中文类型指示信息、英文类型指示信息或法文类型指示信息等。
例如,所述训练数据包括中文语句及该中文语句对应的英文译文拼接成的序列时,在该序列中,所述中文语句的前面可以包括中文类型指示信息,该中文类型指示信息可以指示其后的语句为中文语句,比如,该中文类型指示信息可以为“<zh>”。
类似地,所述英文语句(即该中文语句对应的英文译文)的前面可以包括英文类型指示信息,该英文类型指示信息可以指示其后的语句为英文语句,比如,该英文类型指示信息可以为“<en>”。
本领域技术人员可以理解,上述关于语种类型指示信息的描述仅为示例而非限定,本申请实施例对语种类型指示信息在训练数据中的位置和形式并不限定。
可选地,所述训练数据还可以包括法文、德文、西班牙文等其他语种的语句。例如,所述训练数据可以包括中文语句、及该文语句对应的法文译文、该中文语句对应的德文译文。
结合第一方面,在第一方面的某些实现方式中,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种;其中,所述根据所述训练数据,训练语句复述模型,包括:根据所述训练数据及所述语种指示参数,训练所述语句复述模型。
在本申请实施例中,所述语句复述模型中包括语种指示参数,根据所述训练数据及所述语种指示参数,训练所述语句复述模型,可以使得所述语种复述模型能够(根据所述语种指示参数)生成多种不同语种的复述语句。
可选地,所述语种指示参数为所述语句复述模型的参数。
结合第一方面,在第一方面的某些实现方式中,所述复述语句的语种是根据所述语种指示参数确定的,所述方法还包括:获取语种指示信息,所述语种指示信息用于确定所述语种指示参数;根据所述语种指示信息确定所述语种指示参数。
应理解,上述根据所述语种指示信息确定所述语种指示参数可以理解为:根据所述语种指示信息设置所述语种指示参数。从而可以根据设置好的所述语种指示参数控制所述语句复述模型生成的复述语句的语种。
在本申请实施例中,通过所述语种指示信息设置所述语种指示参数,根据所述语种指 示参数,能够灵活地、便捷地控制所述语句复述模型生成的复述语句的语种。
结合第一方面,在第一方面的某些实现方式中,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
在本申请实施例中,对所述训练数据进行扰动处理后,再使用该加扰后的训练数据训练所述语句复述模型,可以提高所述语句复述模型的鲁棒性,使得生成的复述语句的语义更加准确,形式更加多样。
所述扰动处理还可以包括对语句进行的其他扰动处理,本申请实施例对此并不限定。
可选地,所述扰动处理可以为降噪自编码器技术(denoising auto-encoder,DAE)。
结合第一方面,在第一方面的某些实现方式中,在所述获取训练数据之前,所述方法还包括:获取预训练数据,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个;根据所述预训练数据,训练所述语句复述模型。
在本申请实施例中,使用包括至少一个语句的预训练数据对所述语句复述模型进行训练,可以提高所述语句复述模型生成的语句(复述语句)的流畅性。
可选地,在使用所述预训练数据训练所述语句复述模型时,可以将所述预训练数据作为所述语句复述模型的输入,同时,可以将该预训练数据中的语句作为该输入的标签(即真值),采用反向传播(backpropagation)算法,不断地获取预训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
第二方面,提供了一种语句复述方法,该方法包括:获取输入语句;通过语句复述模型,对所述输入语句进行复述,生成所述输入语句的复述语句,所述复述语句的语种与所述输入语句的语种相同或不同;其中,所述语句复述模型是使用训练数据训练后得到的,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,训练语句复述模型是直接使用该训练数据训练后得到的,而并不依赖复述语句对语料对语句复述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
需要说明的是,在实际中,往往需要通过人工校准才能得到高质量的复述语句对语料,因此,会导致训练语句复述模型的成本较高。
可选地,所述语句复述模型可以为语言模型。
例如,所述语句复述模型可以为Transformer语言模型、CNN语言模型或RNN语言模型等,或者,所述语句复述模型也可以为其他深度学习的语言模型,本申请实施例对此并不限定。
可选地,所述训练数据可以包括一个语句、及该语句对应的不同语种的一个或多个译文。
可选地,所述训练数据包括的多个语句可以为机器翻译语料中的语句。需要说明的是,这里所说的机器翻译语料库可以为多语种平行语料,例如,双语平行(中英文)的机器翻译语料。
例如,所述训练数据可以包括一个中文语句、及该中文语句对应的英文译文。
可选地,所述训练数据还可以包括语种类型指示信息。其中,所述语种类型指示信息 可以为中文类型指示信息、英文类型指示信息或法文类型指示信息等。
例如,所述训练数据包括中文语句及该中文语句对应的英文译文拼接成的序列时,在该序列中,所述中文语句的前面可以包括中文类型指示信息,该中文类型指示信息可以指示其后的语句为中文语句,比如,该中文类型指示信息可以为“<zh>”。
类似地,所述英文语句(即该中文语句对应的英文译文)的前面可以包括英文类型指示信息,该英文类型指示信息可以指示其后的语句为英文语句,比如,该英文类型指示信息可以为“<en>”。
本领域技术人员可以理解,上述关于语种类型指示信息的描述仅为示例而非限定,本申请实施例对语种类型指示信息在训练数据中的位置和形式并不限定。
可选地,所述训练数据还可以包括法文、德文、西班牙文等其他语种的语句。例如,所述训练数据可以包括中文语句、及该文语句对应的法文译文、该中文语句对应的德文译文。
结合第二方面,在第二方面的某些实现方式中,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种,其中,所述语句复述模型是根据所述训练数据及所述语种指示参数训练后得到的。
在本申请实施例中,所述语句复述模型中包括语种指示参数,根据所述训练数据及所述语种指示参数,训练所述语句复述模型,可以使得所述语种复述模型能够(根据所述语种指示参数)生成多种不同语种的复述语句。
可选地,所述语种指示参数为所述语句复述模型的参数。
结合第二方面,在第二方面的某些实现方式中,所述复述语句的语种是根据所述语种指示参数确定的,所述语种指示参数是根据获取到的语种指示信息确定的。
应理解,上述根据所述语种指示信息确定所述语种指示参数可以理解为:根据所述语种指示信息设置所述语种指示参数。从而可以根据设置好的所述语种指示参数控制所述语句复述模型生成的复述语句的语种。
在本申请实施例中,通过所述语种指示信息设置所述语种指示参数,根据所述语种指示参数,能够灵活地、便捷地控制所述语句复述模型生成的复述语句的语种。
结合第二方面,在第二方面的某些实现方式中,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
在本申请实施例中,对所述训练数据进行扰动处理后,再使用该加扰后的训练数据训练所述语句复述模型,可以提高所述语句复述模型的鲁棒性,使得生成的复述语句的语义更加准确,形式更加多样。
所述扰动处理还可以包括对语句进行的其他扰动处理,本申请实施例对此并不限定。
可选地,所述扰动处理可以为降噪自编码器技术(denoising auto-encoder,DAE)。
结合第二方面,在第二方面的某些实现方式中,所述语句复述模型是使用预训练数据训练后、再使用所述训练数据训练后得到的,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个。
在本申请实施例中,使用包括至少一个语句的预训练数据对所述语句复述模型进行训练,可以提高所述语句复述模型生成的语句(复述语句)的流畅性。
可选地,在使用所述预训练数据训练所述语句复述模型时,可以将所述预训练数据作为所述语句复述模型的输入,同时,可以将该预训练数据中的语句作为该输入的标签(即真值),采用反向传播(backpropagation)算法,不断地获取预训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
第三方面,提供了一种训练语句复述模型的装置,包括:获取模块,用于获取训练数据,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义;训练模块,用于根据所述训练数据,训练语句复述模型,所述语句复述模型用于基于输入语句生成所述输入语句的复述语句,所述复述语句与所述输入语句具有相同含义,所述复述语句的语种与所述输入语句的语种相同或不同。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,直接使用该训练数据训练语句复述模型,而并不依赖复述语句对语料对语句复述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
需要说明的是,在实际中,往往需要通过人工校准才能得到高质量的复述语句对语料,因此,会导致训练语句复述模型的成本较高。
可选地,所述语句复述模型可以为语言模型。
例如,所述语句复述模型可以为Transformer语言模型、CNN语言模型或RNN语言模型等,或者,所述语句复述模型也可以为其他深度学习的语言模型,本申请实施例对此并不限定。
可选地,所述训练数据可以包括一个语句、及该语句对应的不同语种的一个或多个译文。
可选地,所述训练数据包括的多个语句可以为机器翻译语料中的语句。需要说明的是,这里所说的机器翻译语料库可以为多语种平行语料,例如,双语平行(中英文)的机器翻译语料。
例如,所述训练数据可以包括一个中文语句、及该中文语句对应的英文译文。
可选地,所述训练数据还可以包括语种类型指示信息。其中,所述语种类型指示信息可以为中文类型指示信息、英文类型指示信息或法文类型指示信息等。
例如,所述训练数据包括中文语句及该中文语句对应的英文译文拼接成的序列时,在该序列中,所述中文语句的前面可以包括中文类型指示信息,该中文类型指示信息可以指示其后的语句为中文语句,比如,该中文类型指示信息可以为“<zh>”。
类似地,所述英文语句(即该中文语句对应的英文译文)的前面可以包括英文类型指示信息,该英文类型指示信息可以指示其后的语句为英文语句,比如,该英文类型指示信息可以为“<en>”。
本领域技术人员可以理解,上述关于语种类型指示信息的描述仅为示例而非限定,本申请实施例对语种类型指示信息在训练数据中的位置和形式并不限定。
可选地,所述训练数据还可以包括法文、德文、西班牙文等其他语种的语句。例如,所述训练数据可以包括中文语句、及该文语句对应的法文译文、该中文语句对应的德文译文。
结合第三方面,在第三方面的某些实现方式中,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种;其中,所述训 练模块具体用于:根据所述训练数据及所述语种指示参数,训练所述语句复述模型。
在本申请实施例中,所述语句复述模型中包括语种指示参数,根据所述训练数据及所述语种指示参数,训练所述语句复述模型,可以使得所述语种复述模型能够(根据所述语种指示参数)生成多种不同语种的复述语句。
可选地,所述语种指示参数为所述语句复述模型的参数。
结合第三方面,在第三方面的某些实现方式中,所述复述语句的语种是根据所述语种指示参数确定的,所述获取模块还用于:获取语种指示信息;所述训练模块还用于:根据所述语种指示信息确定所述语种指示参数。
应理解,上述根据所述语种指示信息确定所述语种指示参数可以理解为:根据所述语种指示信息设置所述语种指示参数。从而可以根据设置好的所述语种指示参数控制所述语句复述模型生成的复述语句的语种。
在本申请实施例中,通过所述语种指示信息设置所述语种指示参数,根据所述语种指示参数,能够灵活地、便捷地控制所述语句复述模型生成的复述语句的语种。
结合第三方面,在第三方面的某些实现方式中,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
在本申请实施例中,对所述训练数据进行扰动处理后,再使用该加扰后的训练数据训练所述语句复述模型,可以提高所述语句复述模型的鲁棒性,使得生成的复述语句的语义更加准确,形式更加多样。
所述扰动处理还可以包括对语句进行的其他扰动处理,本申请实施例对此并不限定。
可选地,所述扰动处理可以为降噪自编码器技术(denoising auto-encoder,DAE)。
结合第三方面,在第三方面的某些实现方式中,所述获取模块还用于:获取预训练数据,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个;所述装置还包括预训练模块,用于:根据所述预训练数据,训练所述语句复述模型。
在本申请实施例中,使用包括至少一个语句的预训练数据对所述语句复述模型进行训练,可以提高所述语句复述模型生成的语句(复述语句)的流畅性。
可选地,在使用所述预训练数据训练所述语句复述模型时,可以将所述预训练数据作为所述语句复述模型的输入,同时,可以将该预训练数据中的语句作为该输入的标签(即真值),采用反向传播(backpropagation)算法,不断地获取预训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
第四方面,提供了一种语句复述装置,包括:获取模块,用于获取输入语句;复述模块,用于通过语句复述模型,对所述输入语句进行复述,生成所述输入语句的复述语句,所述复述语句的语种与所述输入语句的语种相同或不同;其中,所述语句复述模型是使用训练数据训练后得到的,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,训练语句复述模型是直接使用该训练数据训练后得到的,而并不依赖复述语句对语料对语句复述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
需要说明的是,在实际中,往往需要通过人工校准才能得到高质量的复述语句对语料,因此,会导致训练语句复述模型的成本较高。
可选地,所述语句复述模型可以为语言模型。
例如,所述语句复述模型可以为Transformer语言模型、CNN语言模型或RNN语言模型等,或者,所述语句复述模型也可以为其他深度学习的语言模型,本申请实施例对此并不限定。
可选地,所述训练数据可以包括一个语句、及该语句对应的不同语种的一个或多个译文。
可选地,所述训练数据包括的多个语句可以为机器翻译语料中的语句。需要说明的是,这里所说的机器翻译语料库可以为多语种平行语料,例如,双语平行(中英文)的机器翻译语料。
例如,所述训练数据可以包括一个中文语句、及该中文语句对应的英文译文。
可选地,所述训练数据还可以包括语种类型指示信息。其中,所述语种类型指示信息可以为中文类型指示信息、英文类型指示信息或法文类型指示信息等。
例如,所述训练数据包括中文语句及该中文语句对应的英文译文拼接成的序列时,在该序列中,所述中文语句的前面可以包括中文类型指示信息,该中文类型指示信息可以指示其后的语句为中文语句,比如,该中文类型指示信息可以为“<zh>”。
类似地,所述英文语句(即该中文语句对应的英文译文)的前面可以包括英文类型指示信息,该英文类型指示信息可以指示其后的语句为英文语句,比如,该英文类型指示信息可以为“<en>”。
本领域技术人员可以理解,上述关于语种类型指示信息的描述仅为示例而非限定,本申请实施例对语种类型指示信息在训练数据中的位置和形式并不限定。
可选地,所述训练数据还可以包括法文、德文、西班牙文等其他语种的语句。例如,所述训练数据可以包括中文语句、及该文语句对应的法文译文、该中文语句对应的德文译文。
结合第四方面,在第四方面的某些实现方式中,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种,其中,所述语句复述模型是根据所述训练数据及所述语种指示参数训练后得到的。
在本申请实施例中,所述语句复述模型中包括语种指示参数,根据所述训练数据及所述语种指示参数,训练所述语句复述模型,可以使得所述语种复述模型能够(根据所述语种指示参数)生成多种不同语种的复述语句。
可选地,所述语种指示参数为所述语句复述模型的参数。
结合第四方面,在第四方面的某些实现方式中,所述复述语句的语种是根据所述语种指示参数确定的,所述语种指示参数是根据获取到的语种指示信息确定的。
应理解,上述根据所述语种指示信息确定所述语种指示参数可以理解为:根据所述语种指示信息设置所述语种指示参数。从而可以根据设置好的所述语种指示参数控制所述语句复述模型生成的复述语句的语种。
在本申请实施例中,通过所述语种指示信息设置所述语种指示参数,根据所述语种指示参数,能够灵活地、便捷地控制所述语句复述模型生成的复述语句的语种。
结合第四方面,在第四方面的某些实现方式中,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
在本申请实施例中,对所述训练数据进行扰动处理后,再使用该加扰后的训练数据训练所述语句复述模型,可以提高所述语句复述模型的鲁棒性,使得生成的复述语句的语义更加准确,形式更加多样。
所述扰动处理还可以包括对语句进行的其他扰动处理,本申请实施例对此并不限定。
可选地,所述扰动处理可以为降噪自编码器技术(denoising auto-encoder,DAE)。
结合第四方面,在第四方面的某些实现方式中,所述语句复述模型是使用预训练数据训练后、再使用所述训练数据训练后得到的,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个。
在本申请实施例中,使用包括至少一个语句的预训练数据对所述语句复述模型进行训练,可以提高所述语句复述模型生成的语句(复述语句)的流畅性。
可选地,在使用所述预训练数据训练所述语句复述模型时,可以将所述预训练数据作为所述语句复述模型的输入,同时,可以将该预训练数据中的语句作为该输入的标签(即真值),采用反向传播(backpropagation)算法,不断地获取预训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
第五方面,提供了一种训练语句复述模型的装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第一方面中的任意一种实现方式中的方法。
第六方面,提供了一种语句复述装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第二方面中的任意一种实现方式中的方法。
上述第五方面和第六方面中的处理器既可以是中央处理器(central processing unit,CPU),也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing unit,NPU)和张量处理器(tensor processing unit,TPU)等等。其中,TPU是谷歌(google)为机器学习全定制的人工智能加速器专用集成电路。
第七方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面或第二方面中的任意一种实现方式中的方法。
第八方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面中的任意一种实现方式中的方法。
第九方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面中的任意一种实现方式中的方法。
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或 者专用集成电路(application-specific integrated circuit,ASIC)。
第十方面,提供了一种电子设备,该电子设备包括上述第三方面中的任意一个方面中的训练语句复述装置的装置,或者,该电子设备包括上述第四方面中的任意一个方面中的语句复述装置。
当上述电子设备包括上述第三方面中的任意一个方面中的训练语句复述装置的装置时,该电子设备具体可以是服务器。
当上述电子设备包括上述第四方面中的任意一个方面中的语句复述装置时,该电子设备具体可以是终端设备。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,直接使用该训练数据训练语句复述模型,而并不依赖复述语句对语料对语句复述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
附图说明
图1是本申请实施例提供的一种自然语言处理的应用场景示意图;
图2是本申请实施例提供的另一种自然语言处理的应用场景示意图;
图3是本申请实施例提供的自然语言处理的相关设备的示意图;
图4是本申请实施例提供的一种系统架构的示意图;
图5是本申请实施例提供的一种芯片的硬件结构的示意图;
图6是本申请实施例提供的一种应用场景示意图;
图7是本申请一个实施例提供的训练语句复述模型的方法的示意性流程图;
图8是本申请另一个实施例提供的训练语句复述模型的方法的示意性流程图;
图9是本申请一个实施例提供的训练语句复述模型的示意性框图;
图10是本申请另一个实施例提供的训练语句复述模型的示意性框图;
图11是本申请实施例提供的语句复述方法的示意性流程图;
图12是本申请实施例的语句复述装置的硬件结构示意图;
图13是本申请实施例的训练语句复述模型的装置的硬件结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
为了更好地理解本申请实施例的方案,下面先结合图1至图3对本申请实施例可能的应用场景进行简单的介绍。
图1示出了一种自然语言处理系统,该自然语言处理系统包括用户设备以及数据处理设备。其中,用户设备包括手机、个人电脑或者信息处理中心等智能终端。用户设备为自然语言数据处理的发起端,作为语言问答或者查询等请求的发起方,通常用户通过用户设备发起请求。
上述数据处理设备可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据处理功能的设备或服务器。
数据处理设备通过交互接口接收来自智能终端的查询语句/语音/文本等问句,再通过存储数据的存储器以及数据处理的处理器环节进行机器学习,深度学习,搜索,推理,决 策等方式的语言数据处理。数据处理设备中的存储器可以是一个统称,包括本地存储以及存储历史数据的数据库,数据库可以在数据处理设备上,也可以在其它网络服务器上。
在图1所示的自然语言处理系统中,用户设备可以接收用户的指令,以请求对输入语句(例如,该输入语句可以是用户输入的一个句子)进行复述得到复述语句(例如,该复述语句可以是复述得到的、与输入语句具有相同语义的不同表达),然后向数据处理设备发送输入语句,从而使得数据处理设备对输入语句进行复述得到复述语句。
在图1中,数据处理设备可以执行本申请实施例的语句复述方法。
复述(paraphrase)是指对输入语句进行相同语义的不同表达,例如,输入语句为“太阳到地球的距离是多少”,对该输入语句进行复述可以得到“太阳离地球有多远”、“从地球到太阳有多少公里”、“地球距离太阳多少千米”、“地球与太阳相距多大”、“地日距离是多少”等复述语句,这些复述语句都表达了与输入语句相同或相似的语义,即太阳与地球之间的距离是多少,因此,这些语句就可以称为输入语句的复述语句。
换句话说,可以说这些语句与输入语句互为复述;或者,也可以说这些语句与输入语句互为复述语句。
在本申请实施例中,复述可以包括词汇级别、短语级别及句子级别等不同层次的复述,即输入语句和复述语句均可以为语汇、短语或句子,本申请实施例对此并不限定。
例如,词汇级别的复述即通常所说的同义词,例如,词汇级别的复述可以包括:“番茄”和“西红柿”、“car”和“vehicle”。
例如,短语级别的复述可以包括:“北京大学”和“北大”、“consider”和“take…into consideration”。
例如,句子级别的复述可以包括:“姚明的身高是多少?”和“姚明有多高?”、“Messi plays for FC Barcelona in the Spanish Primera League.”和“Messi is a player of Barca in La Liga.”。
需要说明的是,本申请实施例中并不限定输入语句及其复述语句的语种(或者说语言)。输入语句和复述语句可以为中文、英文、德文、法文等各种语种,本申请实施例对此并不限定。
例如,输入语句与复述语句可以为中文;或者,输入语句与复述语句可以为英文,本申请实施例对此并不限定。
图2示出了另一种自然语言处理系统,在图2中,用户设备直接作为数据处理设备,该用户设备能够直接接收来自用户的输入并直接由用户设备本身的硬件进行处理,具体过程与图1相似,可参考上面的描述,在此不再赘述。
在图2所示的自然语言处理系统中,用户设备可以接收用户的指令,由用户设备自身对输入语句进行复述得到复述语句。
在图2中,用户设备自身就可以执行本申请实施例的语句复述方法。
图3是本申请实施例提供的自然语言处理的相关设备的示意图。
上述图1和图2中的用户设备具体可以是图3中的本地设备301或者本地设备302,图1中的数据处理设备具体可以是图3中的执行设备210,其中,数据存储系统250可以存储执行设备210的待处理数据,数据存储系统250可以集成在执行设备210上,也可以设置在云上或其它网络服务器上。
图1和图2中的数据处理设备可以通过神经网络模型或者其它模型(例如,基于支持 向量机的模型)进行数据训练/机器学习/深度学习,并利用数据最终训练或者学习得到的模型对输入语句进行复述得到复述语句。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2020125131-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2020125131-appb-000002
其中,
Figure PCTCN2020125131-appb-000003
是输入向量,
Figure PCTCN2020125131-appb-000004
是输出向量,
Figure PCTCN2020125131-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2020125131-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2020125131-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2020125131-appb-000008
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020125131-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020125131-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常 包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)循环神经网络
循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。
对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播(back propagation,BP)算法,不过有一点区别。例如,如果将RNN进行网络展开,那么参数W,U,V是共享的,而传统神经网络却不是的。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,并且还依赖前面若干步网络的状态。比如,在t=4时,还需要向后传递三步,已经后面的三步都需要加上各种的梯度。该学习算法称为基于时间的反向传播算法。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(5)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(6)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
如图4所示,本申请实施例提供了一种系统架构100。在图4中,数据采集设备160用于采集训练数据,本申请实施例中训练数据包括不同语种的多个语句。
其中,该多个语句可以具有相同含义。
例如,所述训练数据可以包括由机器翻译语料中的语句及其对应的译文拼接成的序列,即同一个语义、但不同语种的多个语句拼接成的序列。为便于描述,在本申请实施例中,也可以将该拼接成的序列称为训练语句。
需要说明的是,这里的拼接指两个语句依次排列,例如,中文语句可以位于其对应的英文语句的前面,或者,英文语句可以位于其对应的中文语句的前面,本申请实施例对此并不限定。
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对训练语句进行处理,得到复述语句,并根据该复述语句确定目标模型/规则101的训练目标(objective),直到目标模型/规则101的奖励大于一定的阈值(和/或小于一定的阈值),从而完成目标模型/规则101的训练。
上述目标模型/规则101能够用于实现本申请实施例的语句复述方法,即将训练语句通过相关预处理(可以采用预处理模块113和/或预处理模块114进行处理)后输入该目标模型/规则101,即可得到复述语句。本申请实施例中的目标模型/规则101具体可以为(多个)神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图4所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图4中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的训练语句。
预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如训练语句)进行预处理(具体可以是对训练语句进行处理,得到词向量),在本申请实施例中,也可以没有预处理模块113和预处理模块114(也可以只有其中的一个预处理模块),而 直接采用计算模块111对输入数据进行处理。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,例如,复述语句反馈给客户设备140。
值得说明的是,训练设备120可以针对不同的下游系统,生成该下游系统对应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图4中所示情况下,用户可以手动给定输入数据(例如,输入一段文字),该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据(例如,输入一段文字),如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式(例如,输出结果可以是复述语句)。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图4仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在图4中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
如图4所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101可以是本申请实施例中的语句复述模型,具体的,本申请实施例中的语句复述模型可以包括至少一个神经网络,所述至少一个神经网络可以包括CNN,深度卷积神经网络(deep convolutional neural network,DCNN),循环神经网络(recurrent neural network,RNN)、Transformer语言模型等等。
图5为本申请实施例提供的一种芯片的硬件结构的示意图。该芯片包括神经网络处理器(neural processing unit,NPU)50。该芯片可以被设置在如图4所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图4所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则101。本申请实施例中的语句复述模型(语言模型)可在如图5所示的芯片中得以实现。
本申请实施例的语句复述方法的具体可以在NPU 50中的运算电路503和/或向量计算单元507中执行,从而得到复述语句。
下面对NPU 50中的各个模块和单元进行简单的介绍。
NPU 50作为协处理器可以挂载到主CPU(host CPU)上,由主CPU分配任务。NPU 50的核心部分为运算电路503,在NPU 50工作时,NPU 50中的控制器504可以控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路503内部包括多个处理单元(process engine,PE)。在一些 实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)508中。
向量计算单元507可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络中非卷积/非全连接层(fully connected layers,FC)层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现中,向量计算单元507能将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器506用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器505(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。
与控制器504连接的取指存储器(instruction fetch buffer)509,用于存储控制器504使用的指令;
控制器504,用于调用指存储器509中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均可以为片上(on-chip)存储器。NPU的外部存储器可以为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
下面结合附图对本申请实施例的语句复述方法进行详细介绍。本申请实施例的语句复述方法可以由图1中的数据处理设备、图2中的用户设备、图3中的执行设备210以及图4中的执行设备110等设备执行,图4中的执行设备110可以包括图5所示的芯片。
本申请实施例提供的语句复述方法可以在服务器上被执行,也可以在云端被执行,还可以在终端设备上被执行。以终端设备为例,如图6所示,本申请实施例的技术方案可以应用于终端设备,本申请实施例中的语句复述方法可以对输入语句进行复述,得到该输入语句的复述语句。该终端设备可以为移动的或固定的,例如该终端设备可以是具有自然语言处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、 智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等,本申请实施例对此不作限定。
复述(paraphrase)是指对于语句进行相同语义的不同表达,复述在自然语言中非常普遍,在自然语言处理(natural language processing,NLP)领域里,复述也得到了越来越广泛的应用。例如,复述可以应用于以下多个领域中。
(1)机器翻译
在机器翻译中,可以使用复述技术,对待翻译语句进行同义改写,以生成更容易翻译的语句。例如,将灵活而不规范的口语复述为规范的句子,从而使翻译得到更好的结果;再例如,复述技术也可以缓解机器翻译系统数据稀疏的问题,即通过复述生成增加翻译的训练语料;此外,复述技术也被用于改进机器翻译的评价。
(2)自动问答系统
在问答系统中,可以使用复述技术对问句进行同义扩展,即生成与原问句意义相同的多个问句,从而解决相同问题不同表达的问题,提升问答系统的召回率。例如,可以对用户提交给问答系统的问题进行在线改写,然后都提交给问答系统召回结果;或者,也可以对知识库中的部分文本内容进行复述扩展,并加入知识库。
(3)信息抽取
复述技术能够为抽取系统自动生成大量的抽取模板,从而提高抽取系统的性能。
(4)信息检索
与问答系统中的应用类似,复述技术可以用来对查询词进行改写和扩展,从而优化信息检索的质量。
(5)自动摘要
在自动摘要任务中,复述技术可以用来计算句子的相似度,从而更好的进行句子聚类、选择等;其次,与机器翻译中的应用类似,复述技术可以改进自动摘要的评价。
需要说明的是,本申请实施例中的语句复述方法在上述各领域中均可以应用。
在自然语言处理中,复述主要包括两类任务,复述(语句)的质量(quality)和复述(语句)的多样性(diversity)。
复述的质量是指:生成的复述语句是否流畅,并与输入语句保持语义一致。例如,输入语句为“太阳到地球的距离是多少”,生成的复述语句为“地球与太阳相距多远”,该复述语句是流畅的、且与输入语句是同义的,则为高质量复述生成。若生成的复述语句为“地球的太阳是多少到距离”,则该复述语句不流畅,若生成的复述语句为“月亮到火星的距离是多少”,则该复述语句与输入语句的语义不相关,这两种复述语句均为低质量复述语句。
复述的多样性是指:生成的复述语句是否多样,并具有信息量(Informative)。例如,输入语句为“太阳到地球的距离是多少”,生成的多个复述语句分别为“地球离太阳有多远”、“日地相距多少公里”及“太阳与地球之间有多少千米”,该多个复述语句与输入语句同义、但均与输入语句为不同的表达,则该多个复述语句的多样性较佳。
在训练语句复述模型(语言模型)时,为了保证复述语句的质量和复述语句的多样性,需要使用复述语句对语料对语句复述模型进行训练,但在实际中,往往通过人工校准才能得到高质量的复述语句对语料,这样会导致训练语句复述模型的成本较高。
基于上述问题,本申请实施例提出了一种语句复述方法、训练语句复述模型的方法,能够便捷地获得复述语句。
图7是本申请实施例提供的训练语句复述模型的方法700的示意性流程图。图7所示的方法700可以由图6中的终端设备执行。
在本申请实施例中,可以使用语句复述模型对输入语句(或训练语句)进行复述,获得输入语句的复述语句。
图7所示的方法可以包括步骤710及步骤720,下面分别对这几个步骤进行详细的介绍。
S710,获取训练数据。
其中,所述训练数据可以包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义。
例如,所述训练数据可以包括两个语句,这两个语句分别是中文语句“太阳到地球的距离是多少”及英文语句“What is the distance between the sun and the earth”。
可以看出,上述两个语句的语种不同,且这两个语句都表示相同的含义。或者可以说,上述英文语句为上述中文语句的英文译文(或者也可以说,上述中文语句为上述英文语句的中文译文)。
在本申请实施例中,对所述训练数据包括的语句的个数及语句的语种并不限定。
可选地,所述训练数据可以包括一个语句、及该语句对应的不同语种的一个或多个译文。
例如,所述训练数据可以包括一个中文语句、及该中文语句对应的英文译文。
可选地,所述训练数据可以为不同语种的多个语句拼接成的序列,其中,该多个语句可以具有相同的含义。
例如,所述训练数据可以包括由机器翻译语料中的语句及其对应的译文拼接成的序列,即同一个语义、但不同语种的多个语句拼接成的序列。
例如,对于双语平行(中英文)的机器翻译语料,本申请实施例中的训练数据可以指:机器翻译语料中的一个中文语句及其对应的英文语句(该中文语句的英文译文)拼接成的序列。
需要说明的是,这里的拼接指两个语句依次排列,例如,中文语句可以位于其对应的英文语句的前面,或者,英文语句可以位于其对应的中文语句的前面,本申请实施例对此并不限定。
可选地,所述训练数据还可以包括语种类型指示信息。其中,所述语种类型指示信息可以为中文类型指示信息、英文类型指示信息或法文类型指示信息等。
例如,所述训练数据包括中文语句及该中文语句对应的英文译文拼接成的序列时,在该序列中,所述中文语句的前面可以包括中文类型指示信息,该中文类型指示信息可以指示其后的语句为中文语句,比如,该中文类型指示信息可以为“<zh>”。
类似地,所述英文语句(即该中文语句对应的英文译文)的前面可以包括英文类型指示信息,该英文类型指示信息可以指示其后的语句为英文语句,比如,该英文类型指示信息可以为“<en>”。
可选地,所述训练数据还可以包括法文、德文、西班牙文等其他语种的语句。例如, 所述训练数据可以包括中文语句、及该文语句对应的法文译文、该中文语句对应的德文译文。
本领域技术人员可以理解,上述关于语种类型指示信息的描述仅为示例而非限定,本申请实施例对此并不限定。
可选地,所述训练数据包括的多个语句中的至少一个语句可以为经过扰动处理的语句。
例如,所述扰动处理可以包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
在本申请实施例中,对所述训练数据进行扰动处理后,再使用该加扰后的训练数据训练所述语句复述模型,可以提高所述语句复述模型的鲁棒性,使得生成的复述语句的语义更加准确,形式更加多样。
所述扰动处理还可以包括对语句进行的其他扰动处理,本申请实施例对此并不限定。
可选地,所述扰动处理可以为降噪自编码器技术(denoising auto-encoder,DAE)。
在本申请实施例中,在上述S710获取训练数据之前,所述方法700还可以包括S701。S701可以称为预训练过程,具体如下:
S701,获取预训练数据,根据所述预训练数据,训练所述语句复述模型。
其中,所述预训练数据可以包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个。
例如,在所述训练数据包括中文语句及英文语句的情况下,所述预训练数据可以包括一个语句,该语句为中文语句或英文语句;或者所述预训练数据可以包括多个语句,该多个语句包括中文语句和/或英文语句。
可选地,在S701中训练所述语句复述模型时,可以将包括一个语句的预训练数据作为所述语句复述模型的输入,同时,可以将该预训练数据中的语句作为该输入的标签(即真值),采用反向传播(backpropagation)算法,不断地获取预训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
也就是说,在S701中,可以使用预训练数据作为所述语句复述模型的输入,并将该预训练数据中的语句作为该输入的标签,对所述语句复述模型进行训练,以使得所述语句复述模型可以输出流畅的语句。
可选地,在所述预训练数据包括多个语句的情况下,所述预训练数据中的多个语句可以为多种不同的语种,本申请实施例中对此并不限定。
例如,所述预训练数据中的语句可以为非平行语料中的语句,该非平行语料中可以包括多种不同语种的语句。
相应地,通过使用包括不同语种语句的预训练数据对所述语句复述模型进行训练,可以使得所述语句复述能够输出该语种(即所述预训练数据中包括的语句的语种)的流畅语句。
具体训练过程具体可以如下述图9中所述。
在本申请实施例中,使用包括至少一个语句的预训练数据对所述语句复述模型进行训练,可以提高所述语句复述模型生成的语句(复述语句)的流畅性。
可选地,所述语句复述模型可以为语言模型。
例如,所述语句复述模型可以为Transformer语言模型、CNN语言模型或RNN语言模型等,或者,所述语句复述模型也可以为其他深度学习的语言模型,本申请实施例对此并不限定。
其中,CNN语言模型受滤波器固定尺寸的限制,只能对局部的词间关系建模;RNN语言模型受循环迭代计算限制,多个节点难以并行计算。
而相比于CNN语言模型及RNN语言模型,Transformer语言模型可以克服CNN语言模型仅能建立词与词间局部连接的弱点,也可以克服RNN语言模型难以在GPU上并行计算、建立双向连接的弱点,因此,能够提高语句复述模型的整体效率。
关于Transformer语言模型的具体描述可以参考现有技术,本申请实施例不再赘述。
S720,根据所述训练数据,训练语句复述模型。
其中,(训练好的)所述语句复述模型可以用于基于输入语句生成所述输入语句的复述语句。
可选地,所述复述语句与所述输入语句可以具有相同含义,所述复述语句的语种与所述输入语句的语种可以相同;或者,所述复述语句的语种与所述输入语句的语种也可以不同。
例如,所述输入语句为中文语句时,经所述语句复述模型得到的所述输入语句的复述语句可以为法文语句,但该中文语句与该法文语句具体相同含义(即该中文语句与该法文语句为同一语义在不同语种下的不同表达)。
可选地,所述复述语句的语种可以是根据所述语种指示参数确定的。
可选地,在S720中训练所述语句复述模型时,可以将包括多个语句的训练数据作为所述语句复述模型的输入,同时,可以将该训练数据中的多个语句作为该输入的标签(即真值),采用反向传播算法,不断地获取训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
S720中的训练过程与上述S701中的训练过程类似,具体训练过程具体可以如下述图9中所述。
可选地,所述语句复述模型中还可以包括语种指示参数。其中,所述语种指示参数可以用于指示所述语句复述模型生成的复述语句的语种。
在本申请实施例中,所述语句复述模型中包括语种指示参数,根据所述训练数据及所述语种指示参数,训练所述语句复述模型,可以使得所述语种复述模型能够(根据所述语种指示参数)生成多种不同语种的复述语句。
可选地,所述语种指示参数为所述语句复述模型的参数。
可以看出,若使用Transformer语言模型作为所述语句复述模型时,本申请实施例中的Transformer语言模型与现有技术中的Transformer语言模型区别在于,本申请实施例中的Transformer语言模型包括语种指示参数。
在本申请实施例中,所述方法700还可以包括:获取语种指示信息。其中,所述语种指示信息可以用于确定所述语句复述模型生成的复述语句的语种。
可选地,所述根据所述训练数据,训练语句复述模型,可以包括:
根据所述语种指示信息确定所述语种指示参数;根据所述训练数据及所述语种指示参数,训练所述语句复述模型。
在本申请实施例中,通过所述语种指示信息设置所述语种指示参数,根据所述语种指示参数,能够灵活地、便捷地控制所述语句复述模型生成的复述语句的语种。
应理解,上述根据所述语种指示信息确定所述语种指示参数可以是指:根据所述语种指示信息设置所述语种指示参数。从而可以根据设置好的所述语种指示参数控制所述语句复述模型生成的复述语句的语种。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,直接使用该训练数据训练语句复述模型,而并不依赖复述语句对语料对语句复述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
图8是本申请实施例提供的训练语句复述模型的方法800的示意性流程图。图8所示的方法800可以由图6中的终端设备执行。
在本申请实施例中,可以使用语句复述模型对输入语句(或训练语句)进行复述,获得输入语句的复述语句。
图8所示的方法800可以包括步骤810、步骤820及步骤830,下面分别对这几个步骤进行详细的介绍。
S810,使用非平行语料对语句复述模型进行预训练。
可选地,可以使用多语种非平行语料
Figure PCTCN2020125131-appb-000011
来预训练所述语句复述模型,其中,S i可以为任意语种的语句,n、i均为正整数。
可选地,S i可以各种不同语种的语句,例如,为中文语句“太阳到地球的距离是多少”,或者,S i也可以为英文语句“What is the distance between the sun and the earth”,本申请实施例对S i的语种并不限定。
可选地,所述语句复述模型可以为Transformer语言模型。需要说明的是,本申请实施例中的Transformer语言模型带有语种指示参数,与现有技术中的Transformer语言模型不同。
如图9所示,以输入语句为中文语句为例,例如,可以将输入语句“太阳到地球的距离是多少”输入到所述语句复述模型,经过处理可以得到该输入语句多个词向量,将该多个词向量与该输入语句的词位置编码向量进行线性组合,可以得到该输入语句的向量E。
进一步地,将该输入语句的向量E经过语句复述模型(如图9中的由注意力机制连接的多层Transformer模块)计算,可以得到一组(与该输入语句的)上下文相关的隐藏状态向量E T
再将E T与语种指示参数h zh(中文指示参数)结合,可以得到隐藏状态向量(hidden state)H,基于各个时刻的隐藏状态向量H可以预测当前时刻的生成词。
所述语句复述模型可以表示为:
Figure PCTCN2020125131-appb-000012
其中,θ为所述语句复述模型中待学习的参数,h zh为语种指示参数,x为输入语句,y t为输入语句,t为正整数。
其中,所述语种指示参数可以包括多个不同语种的语种指示参数,例如,所述语种指示参数可以包括中文指示参数h zh、英文指示参数h en、德文指示参数h de或西班牙文指示参数h es等中的一个或多个。
经过S810中的预训练,所述语句复述模型可以按指定语种,生成流畅的自然语言语句,但还不能生成输入语句的复述语句。
S810中的预训练过程可以采用反向传播算法,不断地获取预训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的预训练。
为了能生成复述语句,所述语句复述模型还需要进一步在机器翻译的平行语料
Figure PCTCN2020125131-appb-000013
上训练。
S820,使用机器翻译的平行语料对语句复述模型进行训练。
可选地,可以使用机器翻译的平行语料
Figure PCTCN2020125131-appb-000014
来训练所述语句复述模型,其中,语句Y i可以为输入语句X i的译文,m、i均为正整数。
可选地,X i可以各种不同语种的语句,例如,X i可以为中文语句“太阳到地球的距离是多少”,或者,X i也可以为英文语句“What is the distance between the sun and the earth”,本申请实施例对X i的语种并不限定。
同样地,本申请实施例对Y i的语种也不作限定,即语句Y i可以为输入语句X i的任意语种的译文。
可选地,可以将输入语句X i及其对应的译文Y i拼接成序列。
如图10所示,可以将中文语句“太阳到地球的距离是多少”及其英文译文“What is the distance between the sun and the earth”拼接成一个序列,并将该序列输入上述经S810预训练后的所述语句复述模型。
例如,如图10所示,可以使用中文类型指示信息<zh>指示其后的输入语句为中文,可以使用英文类型指示信息<en>指示其后的输入语句为英文。
类似地,如图10所示,所述语种指示参数h zh可以控制其对应的输入语句(或词)为中文,所述语种指示参数h en可以控制其对应的输入语句(或词)为英文。
在本申请实施例中,在使用语句复述模型对输入语句X i及其对应的译文Y i拼接成序列进行处理之前,还可以对输入语句X i及其对应的译文Y i中的至少一个进行扰动处理。
可选地,所述扰动处理可以包括随机删除语句中的词、随机调换语句的词序以及随机向语句中插入词中的至少一项。所述扰动处理还可以包括对语句进行的其他扰动处理,本申请实施例对此并不限定。
例如,可以使用降噪自编码器技术(denoising auto-encoder,DAE)对输入语句X i及其对应的译文Y i中的至少一个进行扰动处理。
S820中的训练过程与S810中的预训练过程类似,采用反向传播算法,不断地获取训练数据进行迭代训练,直至损失函数收敛,则完成该语句复述模型的训练。
训练完成后,可以控制所述语句复述模型生成的复述语句的语种。
S830,对语句复述模型进行测试。
当所述语句复述模型经过上述训练过程后,就可以进入测试阶段,即用于实际的复述语句的生成。
在本申请实施例中,对于给定的一个输入语句,可以使用所述语句复述模型生成一个或多个复述语句。
在本申请实施例中,在对语句复述模型进行训练,而并不依赖复述语句对语料,而是直接使用上述训练数据对语句复述模型进行训练,因此,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
图11示出了本申请实施例提供的语句复述方法1100的示意性流程图,图11所示的 方法可以由图6中的终端设备执行。图11所示的方法包括步骤1110及步骤1120,下面分别对这几个步骤进行详细的介绍。
S1110,获取输入语句。
其中,所述输入语句可以为词汇、短语或句子,同时,所述训练语句也可以为各种语言,本申请实施例对此并不限定。
S1120,通过语句复述模型,对所述输入语句进行复述,生成所述输入语句的复述语句。
其中,所述复述语句与所述输入语句可以具有相同含义,所述复述语句的语种与所述输入语句的语种可以相同;或者,所述复述语句的语种与所述输入语句的语种也可以不同。
可选地,所述复述语句的语种可以是根据所述语种指示参数确定的。
可选地,所述输入语句的复述语句可以为词汇、短语或句子,同时,所述复述语句也可以为各种语言,本申请实施例对此并不限定。
其中,所述语句复述模型可以是使用训练数据训练后得到的,所述训练数据可以包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义。
可选地,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种,其中,所述语句复述模型是根据所述训练数据及所述语种指示参数训练后得到的。
在本申请实施例中,所述语句复述模型中包括语种指示参数,根据所述训练数据及所述语种指示参数,训练所述语句复述模型,可以使得所述语种复述模型能够(根据所述语种指示参数)生成多种不同语种的复述语句。
可选地,所述语种指示参数是根据获取到的语种指示信息确定的,所述语种指示信息用于确定所述语种指示参数。
在本申请实施例中,通过所述语种指示信息设置所述语种指示参数,根据所述语种指示参数,能够灵活地、便捷地控制所述语句复述模型生成的复述语句的语种。
可选地,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
在本申请实施例中,对所述训练数据进行扰动处理后,再使用该加扰后的训练数据训练所述语句复述模型,可以提高所述语句复述模型的鲁棒性,使得生成的复述语句的语义更加准确,形式更加多样。
可选地,所述语句复述模型是使用预训练数据训练后、再使用所述训练数据训练后得到的,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个。
在本申请实施例中,使用包括至少一个语句的预训练数据对所述语句复述模型进行训练,可以提高所述语句复述模型生成的语句(复述语句)的流畅性。
需要说明的是,图11中的所述语句复述模型可以是使用图7或图8所示的训练语句复述模型的方法进行训练后得到的。
在本申请实施例中,所述训练数据包括语种不同、且具有相同含义的多个语句,训练语句复述模型是直接使用该训练数据训练后得到的,而并不依赖复述语句对语料对语句复 述模型进行训练,可以降低训练语句复述模型的成本,从而能够便捷地获得复述语句。
图12是本申请实施例的语句复述装置的硬件结构示意图。图12所示的语句复述装置4000包括存储器4001、处理器4002、通信接口4003以及总线4004。其中,存储器4001、处理器4002、通信接口4003通过总线4004实现彼此之间的通信连接。
应理解,图12所示的装置仅示例而非限定,所述语句复述装置4000可以包括更多或更少的模块或单元,本申请实施例中对此并不限定。
存储器4001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器4001可以存储程序,当存储器4001中存储的程序被处理器4002执行时,处理器4002和通信接口4003用于执行本申请实施例的语句复述方法的各个步骤。
处理器4002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的语句复述装置中的单元所需执行的功能,或者执行本申请方法实施例的语句复述方法。
处理器4002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的语句复述方法的各个步骤可以通过处理器4002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器4002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。上述通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器4001,处理器4002读取存储器4001中的信息,结合其硬件完成本申请实施例的语句复述装置中包括的单元所需执行的功能,或者执行本申请方法实施例的语句复述方法。
通信接口4003使用例如但不限于收发器一类的收发装置,来实现装置4000与其他设备或通信网络之间的通信。例如,可以通过通信接口4003获取输入语句。
总线4004可包括在装置4000各个部件(例如,存储器4001、处理器4002、通信接口4003)之间传送信息的通路。
图13是本申请实施例的训练语句复述模型的装置5000的硬件结构示意图。与上述装置4000类似,图13所示的训练语句复述模型的装置5000包括存储器5001、处理器5002、通信接口5003以及总线5004。其中,存储器5001、处理器5002、通信接口5003通过总线5004实现彼此之间的通信连接。
应理解,图13所示的装置仅示例而非限定,所述训练语句复述模型的装置5000可以包括更多或更少的模块或单元,本申请实施例中对此并不限定。
存储器5001可以存储程序,当存储器5001中存储的程序被处理器5002执行时,处理器5002用于执行本申请实施例的训练语句复述模型的方法的各个步骤。
处理器5002可以采用通用的CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的训练语句复述模型的方法。
处理器5002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的训练语句复述模型的方法的各个步骤可以通过处理器5002中的硬件的集成逻辑电路或者软件形式的指令完成。
应理解,通过图13所示的训练语句复述模型的装置5000对语句复述模型进行训练,训练得到的语句复述模型就可以用于执行本申请实施例的语句复述方法了。具体地,通过装置5000对语句复述模型进行训练能够得到图11所示的方法中的语句复述模型。
具体地,图13所示的装置可以通过通信接口5003从外界获取训练数据以及待训练的语句复述模型,然后由处理器根据训练数据对待训练的语句复述模型进行训练。
应注意,尽管上述装置4000和装置5000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置4000和装置5000还可以包括实现正常运行所必须的其他器件。
同时,根据具体需要,本领域的技术人员应当理解,装置4000和装置5000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置4000和装置5000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图12和图13中所示的全部器件。
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机 可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现 有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种训练语句复述模型的方法,其特征在于,包括:
    获取训练数据,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义;
    根据所述训练数据,训练语句复述模型,所述语句复述模型用于基于输入语句生成所述输入语句的复述语句,所述复述语句与所述输入语句具有相同含义,所述复述语句的语种与所述输入语句的语种相同或不同。
  2. 根据权利要求1所述的方法,其特征在于,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种;
    其中,所述根据所述训练数据,训练语句复述模型,包括:
    根据所述训练数据及所述语种指示参数,训练所述语句复述模型。
  3. 根据权利要求2所述的方法,其特征在于,所述复述语句的语种是根据所述语种指示参数确定的,所述方法还包括:
    获取语种指示信息;
    根据所述语种指示信息确定所述语种指示参数。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,在所述获取训练数据之前,所述方法还包括:
    获取预训练数据,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个;
    根据所述预训练数据,训练所述语句复述模型。
  6. 一种语句复述方法,其特征在于,包括:
    获取输入语句;
    通过语句复述模型,对所述输入语句进行复述,生成所述输入语句的复述语句,所述复述语句的语种与所述输入语句的语种相同或不同;
    其中,所述语句复述模型是使用训练数据训练后得到的,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义。
  7. 根据权利要求6所述的方法,其特征在于,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种,其中,所述语句复述模型是根据所述训练数据及所述语种指示参数训练后得到的。
  8. 根据权利要求7所述的方法,其特征在于,所述复述语句的语种是根据所述语种指示参数确定的,所述语种指示参数是根据获取到的语种指示信息确定的。
  9. 根据权利要求6至8中任一项所述的方法,其特征在于,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
  10. 根据权利要求6至9中任一项所述的方法,其特征在于,所述语句复述模型是使用预训练数据训练后、再使用所述训练数据训练后得到的,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个。
  11. 一种训练语句复述模型的装置,其特征在于,包括:
    获取模块,用于获取训练数据,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义;
    训练模块,用于根据所述训练数据,训练语句复述模型,所述语句复述模型用于基于输入语句生成所述输入语句的复述语句,所述复述语句与所述输入语句具有相同含义,所述复述语句的语种与所述输入语句的语种相同或不同。
  12. 根据权利要求11所述的装置,其特征在于,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种;
    其中,所述训练模块具体用于:
    根据所述训练数据及所述语种指示参数,训练所述语句复述模型。
  13. 根据权利要求12所述的装置,其特征在于,所述复述语句的语种是根据所述语种指示参数确定的,所述获取模块还用于:获取语种指示信息;
    所述训练模块还用于:根据所述语种指示信息确定所述语种指示参数。
  14. 根据权利要求11至13中任一项所述的装置,其特征在于,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
  15. 根据权利要求11至14中任一项所述的装置,其特征在于,所述获取模块还用于:
    获取预训练数据,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个;
    所述装置还包括预训练模块,用于:
    根据所述预训练数据,训练所述语句复述模型。
  16. 一种语句复述装置,其特征在于,包括:
    获取模块,用于获取输入语句;
    复述模块,用于通过语句复述模型,对所述输入语句进行复述,生成所述输入语句的复述语句,所述复述语句的语种与所述输入语句的语种相同或不同;
    其中,所述语句复述模型是使用训练数据训练后得到的,所述训练数据包括多个语句,所述多个语句的语种不同,且所述多个语句具有相同含义。
  17. 根据权利要求16所述的装置,其特征在于,所述语句复述模型中包括语种指示参数,所述语种指示参数用于指示所述语句复述模型生成的复述语句的语种,其中,所述语句复述模型是根据所述训练数据及所述语种指示参数训练后得到的。
  18. 根据权利要求17所述的装置,其特征在于,所述复述语句的语种是根据所述语种指示参数确定的,所述语种指示参数是根据获取到的语种指示信息确定的。
  19. 根据权利要求16至18中任一项所述的装置,其特征在于,所述训练数据包括的多个语句中的至少一个语句为经过扰动处理的语句,所述扰动处理包括随机删除语句中的词、随机调换语句中词的词序以及随机向语句中插入词中的至少一项。
  20. 根据权利要求16至19中任一项所述的装置,其特征在于,所述语句复述模型是使用预训练数据训练后、再使用所述训练数据训练后得到的,所述预训练数据包括一个或多个语句,所述预训练数据包括的语句的语种为所述训练数据包括的语句的语种中的一个或多个。
  21. 一种训练语句复述模型的装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求1至5中任一项所述的方法。
  22. 一种语句复述装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求6至10中任一项所述的方法。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至5或6至10中任一项所述的方法。
  24. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至5或6至10中任一项所述的方法。
PCT/CN2020/125131 2019-11-01 2020-10-30 训练语句复述模型的方法、语句复述方法及其装置 WO2021083312A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911061874.7A CN112784003A (zh) 2019-11-01 2019-11-01 训练语句复述模型的方法、语句复述方法及其装置
CN201911061874.7 2019-11-01

Publications (1)

Publication Number Publication Date
WO2021083312A1 true WO2021083312A1 (zh) 2021-05-06

Family

ID=75714462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125131 WO2021083312A1 (zh) 2019-11-01 2020-10-30 训练语句复述模型的方法、语句复述方法及其装置

Country Status (2)

Country Link
CN (1) CN112784003A (zh)
WO (1) WO2021083312A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997140B (zh) * 2021-09-17 2023-04-28 荣耀终端有限公司 校验语义的方法和装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121419A1 (en) * 2016-10-31 2018-05-03 Samsung Electronics Co., Ltd. Apparatus and method for generating sentence
US20180329883A1 (en) * 2017-05-15 2018-11-15 Thomson Reuters Global Resources Unlimited Company Neural paraphrase generator
CN109710915A (zh) * 2017-10-26 2019-05-03 华为技术有限公司 复述语句生成方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858044B (zh) * 2019-02-01 2023-04-18 成都金山互动娱乐科技有限公司 语言处理方法和装置、语言处理系统的训练方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121419A1 (en) * 2016-10-31 2018-05-03 Samsung Electronics Co., Ltd. Apparatus and method for generating sentence
US20180329883A1 (en) * 2017-05-15 2018-11-15 Thomson Reuters Global Resources Unlimited Company Neural paraphrase generator
CN109710915A (zh) * 2017-10-26 2019-05-03 华为技术有限公司 复述语句生成方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHONG ZHOU; MATTHIAS SPERBER; ALEX WAIBEL: "Paraphrases as Foreign Languages in Multilingual Neural Machine Translation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 August 2018 (2018-08-25), 201 Olin Library Cornell University Ithaca, NY 14853, XP080899958 *

Also Published As

Publication number Publication date
CN112784003A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
CN112487182B (zh) 文本处理模型的训练方法、文本处理方法及装置
WO2022007823A1 (zh) 一种文本数据处理方法及装置
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
WO2021057884A1 (zh) 语句复述方法、训练语句复述模型的方法及其装置
WO2022068627A1 (zh) 一种数据处理方法及相关设备
WO2022057776A1 (zh) 一种模型压缩方法及装置
CN111368993B (zh) 一种数据处理方法及相关设备
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2021208612A1 (zh) 数据处理的方法与装置
WO2022001724A1 (zh) 一种数据处理方法及装置
WO2022253061A1 (zh) 一种语音处理方法及相关设备
WO2021129411A1 (zh) 文本处理方法及装置
WO2023165361A1 (zh) 一种数据处理方法及相关设备
WO2020192523A1 (zh) 译文质量检测方法、装置、机器翻译系统和存储介质
WO2021136058A1 (zh) 一种处理视频的方法及装置
CN113505193A (zh) 一种数据处理方法及相关设备
US20240152770A1 (en) Neural network search method and related device
WO2024114659A1 (zh) 一种摘要生成方法及其相关设备
WO2021083312A1 (zh) 训练语句复述模型的方法、语句复述方法及其装置
CN117453949A (zh) 一种视频定位方法以及装置
CN110334359B (zh) 文本翻译方法和装置
CN116109449A (zh) 一种数据处理方法及相关设备
CN113095072B (zh) 文本处理方法及装置
WO2023143262A1 (zh) 一种数据处理方法及相关设备
CN114817452A (zh) 一种语义匹配方法、装置、设备及可存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20882639

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20882639

Country of ref document: EP

Kind code of ref document: A1