WO2022142823A1 - 人机对话方法、装置、计算机设备及可读存储介质 - Google Patents

人机对话方法、装置、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2022142823A1
WO2022142823A1 PCT/CN2021/131221 CN2021131221W WO2022142823A1 WO 2022142823 A1 WO2022142823 A1 WO 2022142823A1 CN 2021131221 W CN2021131221 W CN 2021131221W WO 2022142823 A1 WO2022142823 A1 WO 2022142823A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
sentence
conversation
round
output
Prior art date
Application number
PCT/CN2021/131221
Other languages
English (en)
French (fr)
Inventor
马力
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022142823A1 publication Critical patent/WO2022142823A1/zh
Priority to US17/870,813 priority Critical patent/US20220358297A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the technical field of man-machine dialogue, and in particular, to a man-machine dialogue method, apparatus, computer equipment and readable storage medium.
  • human-machine dialogue technology has been widely used in chat robots, mobile assistants, intelligent customer service, voice navigation and other scenarios.
  • human-machine dialogue technology important effect to be achieved.
  • the high-resource languages such as English
  • the low-resource languages such as Chinese, Japanese, Korean, etc.
  • open domain multi-round conversation data is relatively poor, which leads to the multi-round conversation generation model corresponding to high-resource languages must be more mature than the multi-round conversation generation model for low-resource languages.
  • how to achieve correct and smooth multi-round human-machine conversation for low-resource languages is a technical problem that needs to be solved urgently.
  • the purpose of this application includes providing a man-machine dialogue method, device, computer equipment and readable storage medium, which can realize the multi-round man-machine conversation of low-resource language by borrowing the multi-round conversation generation model of high-resource language. At the same time, improve the cohesion and cohesion of the content of multi-round human-machine conversations in low-resource languages in terms of semantics and context.
  • the present application provides a man-machine dialogue method, the method comprising:
  • the dialogue content in the first language of the historical conversation round and the dialogue content in the second language that has a mutual translation relationship with the dialogue content in the first language, translate the input sentence in the first language of the current conversation round to obtain the current conversation
  • the second language input sentence of the round wherein the first language dialogue content includes the first language input sentence and the first language output sentence corresponding to the conversation round, and the second language dialogue content includes the second language corresponding to the conversation round.
  • the multi-round conversation generation module is obtained based on multi-round conversation material training in the second language
  • the second language of the current conversation round According to the first language dialogue content and the second language dialogue content of the historical conversation round, and the first language input sentence and the second language input sentence of the current conversation round, the second language of the current conversation round
  • the language output sentence is translated, and at least one candidate result of the first language is obtained;
  • the first language output sentence of the current session round is determined from the at least one first language candidate result for output.
  • the step of determining the output sentence in the first language of the current conversation round from the at least one first language candidate result for output includes:
  • For each first language candidate result call the pre-stored coherence evaluation model to calculate the expression coherence of the first language candidate result according to the first language dialogue content of the historical conversation round and the first language input sentence of the current conversation round Spend;
  • the method further includes:
  • each valid conversation material sample includes the first language dialogue content of multiple consecutive conversation rounds
  • the initial classification model is trained by using the obtained plurality of the valid conversation material samples and the plurality of the negative conversation material samples to obtain the coherence evaluation model.
  • the step of constructing a negative conversation material sample corresponding to the valid conversation material sample includes:
  • the target first language output sentence in the valid conversation material sample is replaced by the negative first language output sentence to obtain the negative conversation material sample.
  • the step of determining, according to the comparison result, a negative first language output statement matching the target first language output statement includes:
  • the first language expression sentence is used as the reverse first language output sentence
  • the present application provides a human-machine dialogue device, the device comprising:
  • the input sentence obtaining module is used to obtain the first language input sentence of the current session round
  • the input sentence translation module is used to input the first language of the current session round according to the first language dialogue content of the historical conversation round and the second language dialogue content that has a mutual translation relationship with the first language dialogue content
  • the sentence is translated to obtain the second language input sentence of the current session round, wherein the first language dialogue content includes the first language input sentence and the first language output sentence corresponding to the session round, and the second language dialogue content includes The second language input sentence and the second language output sentence corresponding to the conversation round;
  • the output sentence generation module is used to call the pre-stored multi-round conversation generation model to parse the second language input sentence of the current conversation round according to the second language dialogue content of the historical conversation round, and generate the current conversation round.
  • the second language output sentence wherein the multi-round conversation generation module is obtained based on the multi-round conversation material training in the second language;
  • the output sentence translation module is used for, according to the first language dialogue content and the second language dialogue content of the historical conversation round, and the first language input sentence and the second language input sentence of the current session round, to the said Translate the output sentences in the second language of the current session round to obtain at least one candidate result in the first language;
  • the output sentence reply module is configured to determine the first language output sentence of the current session round from the at least one first language candidate result for outputting.
  • the output sentence reply module includes:
  • the sentence coherence calculation sub-module is used to call the pre-stored coherence evaluation model for each first language candidate result, and calculate the The expressive coherence of the candidate results in the first language;
  • the sentence selection and output sub-module is used to select the first language candidate result with the largest expression coherence, and output it as the first language output sentence of the current conversation round.
  • the device further comprises:
  • a valid sample acquisition module configured to acquire a plurality of valid conversation material samples using the first language, wherein each valid conversation material sample includes the first language dialogue content of multiple consecutive conversation rounds;
  • a negative sample construction module used for constructing a negative conversation material sample corresponding to the valid conversation material sample for each valid conversation material sample
  • the classification model training module is used for training the initial classification model by using the obtained multiple valid conversation utterance samples and the plurality of negative conversation utterance samples to obtain the coherence evaluation model.
  • the negative sample building block includes:
  • a target sentence extraction sub-module used for extracting the target first language output sentence corresponding to the last conversation round from the valid conversation material sample
  • a target sentence translation submodule configured to translate the target first language output sentence to obtain a corresponding second language expression sentence
  • an expression sentence translation submodule used for translating the second language expression sentence to obtain a corresponding first language expression sentence
  • an edit distance calculation submodule for calculating the minimum edit distance between the target first language output sentence and the first language expression sentence
  • a negative statement determining submodule configured to compare the calculated minimum edit distance with a preset distance threshold, and determine a negative first language output statement matching the target first language output statement according to the comparison result;
  • the target sentence replacement submodule is configured to replace the target first language output sentence in the valid conversation material sample with the negative first language output sentence to obtain the negative conversation material sample.
  • the negative sentence determination submodule determines, according to the comparison result, a mode of the negative first language output sentence matching the target first language output sentence, including:
  • the first language expression sentence is used as the reverse first language output sentence
  • the present application provides a computer device, the computer device includes a processor and a memory, the memory stores a computer program that can be executed by the processor, and the processor can execute the computer program to achieve The man-machine dialogue method described in any one of the foregoing embodiments.
  • the present application provides a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the man-machine dialogue method described in any one of the foregoing embodiments.
  • the present application After obtaining the first language input sentence of the current session round, the present application will analyze the current session based on the first language dialogue content of the historical session round and the second language dialogue content that has a mutual translation relationship with the first language dialogue content.
  • the first language input sentence of the round is translated to obtain the corresponding second language input sentence, and then the multi-round conversation generation model based on the second language is based on the second language dialogue content of the historical conversation round.
  • the input sentence is parsed to generate the corresponding output sentence in the second language, and then according to the dialogue content of the first language and the dialogue content of the second language of the historical session round, and the input sentence of the first language and the input sentence of the second language of the current session round, correct
  • the second language output sentence of the current session round is translated to obtain at least one first language candidate result, and finally the first language output sentence of the current session round is determined from the at least one first language candidate result for output, so that it can be used at a high borrowing rate.
  • the multi-round conversation generation model for resource languages realizes multi-round human-machine conversations in low-resource languages, and at the same time realizes the inter-translation operation between sentences in low-resource languages and high-resource languages by combining the specific conditions of existing dialogue contents, improving low-resource languages.
  • FIG. 1 is a schematic diagram of the composition of a computer device provided by an embodiment of the present application.
  • FIG. 2 is one of the schematic flowcharts of the man-machine dialogue method provided by the embodiment of the present application
  • FIG. 3 is a schematic flowchart of the sub-steps included in step S250 in FIG. 2;
  • FIG. 4 is the second schematic flowchart of the man-machine dialogue method provided by the embodiment of the present application.
  • FIG. 5 is a schematic flowchart of the sub-steps included in step S270 in FIG. 4;
  • FIG. 6 is one of the schematic diagrams of the composition of the human-machine dialogue device provided by the embodiment of the present application.
  • Fig. 7 is the composition schematic diagram of the output sentence reply module in Fig. 6;
  • FIG. 8 is the second schematic diagram of the composition of the man-machine dialogue device provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of the composition of the negative sample building module in FIG. 8 .
  • One of the existing solutions for realizing multi-round human-machine conversations for low-resource languages is to use machine translation technology to directly translate query texts in low-resource languages into query texts in high-resource languages.
  • the multi-round conversation generation model trained with multi-round conversation data in the open field generates the corresponding high-resource language reply text for the translated query text in the high-resource language, and then directly translates the generated high-resource language reply text into The reply text in the low-resource language is output, so as to complete the man-machine dialogue operation in a low-resource language for one session.
  • An input operation, and an output operation of the reply text in a low-resource language is an input operation, and an output operation of the reply text in a low-resource language.
  • the embodiments of the present application provide a human-machine dialogue method, device, computer
  • the device and the readable storage medium implement the aforementioned functions.
  • FIG. 1 is a schematic diagram of the composition of a computer device 10 provided by an embodiment of the present application.
  • the computer device 10 uses the multi-round conversation generation model of the high-resource language to realize the multi-round human-machine conversation of the low-resource language, and at the same time combines the specific situation of the existing dialogue content into the low-resource language sentence.
  • the final dialogue content in low-resource languages has an excellent degree of cohesion in terms of semantics and context, so that low-resource languages can borrow multiple rounds of high-resource languages.
  • the conversational generation model truly enables correct and smooth multi-round human-machine conversational operation.
  • the computer device 10 may be, but not limited to, a smart phone, a tablet computer, a personal computer, a server, a dialogue robot, and the like.
  • the computer device 10 may include a memory 11 , a processor 12 , a communication unit 13 and a human-machine dialogue device 100 .
  • the elements of the memory 11 , the processor 12 and the communication unit 13 are directly or indirectly electrically connected to each other to realize data transmission or interaction.
  • the elements of the memory 11, the processor 12 and the communication unit 13 can be electrically connected to each other through one or more communication buses or signal lines.
  • the memory 11 may be, but not limited to, a random access memory (Random Access Memory, RAM), a read only memory (Read Only Memory, ROM), a programmable read only memory (Programmable Read-Only Memory) Memory, PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Read-Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
  • RAM Random Access Memory
  • ROM read only memory
  • PROM programmable read only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically Erasable Read-Only Memory
  • the memory 11 is also used to store the multi-round conversation generation model, which is obtained based on the multi-round conversation material training in the open domain corresponding to the high-resource language, so that the multi-round conversation generation model can be For the input sentence of the resource language, the corresponding output sentence as the reply content is generated.
  • the processor 12 may be an integrated circuit chip with signal processing capability.
  • the processor 12 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a network processor (Network Processor, NP), a digital signal processor (DSP) ), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, at least one of a discrete gate or transistor logic device, a discrete hardware component.
  • the general-purpose processor may be a microprocessor, or the processor may also be any conventional processor, etc., and may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • the communication unit 13 is configured to establish a communication connection between the computer device 10 and other electronic devices through a network, and to send and receive data through the network, wherein the network includes a wired communication network and wireless communication network.
  • the computer device 10 can obtain the query text in the low-resource language from the client device used by the user through the communication unit 13, and after generating the corresponding reply text for the query text in the low-resource language, pass The communication unit 13 outputs the reply text to the client device used by the user; the computer device 10 can transmit the reply text in the low-resource language generated by itself to the audio playback device for voice playback through the communication unit 13 .
  • the human-machine dialogue device 100 includes at least one software function module that can be stored in the memory 11 or fixed in the operating system of the computer device 10 in the form of software or firmware.
  • the processor 12 may be configured to execute executable modules stored in the memory 11 , such as software function modules and computer programs included in the human-machine dialogue device 100 .
  • the computer device 10 uses the human-machine dialogue device 100 to implement multi-round human-machine conversations in low-resource languages by borrowing a multi-round conversation generation model in high-resource languages, and at the same time combines the specific conditions of existing dialogue content into low-resource languages. In the process of inter-translation between sentences and sentences in high-resource languages, improve the cohesion and cohesion of the content of multi-round human-machine conversations in low-resource languages in terms of semantics and context.
  • FIG. 1 is only a schematic diagram of the composition of the computer device 10, and the computer device 10 may further include more or less components than those shown in FIG. 1 shows different configurations. Each component shown in FIG. 1 may be implemented in hardware, software, or a combination thereof.
  • the embodiment of the present application achieves the foregoing object by providing a man-machine dialogue method.
  • the man-machine dialogue method provided by the present application will be described in detail below.
  • FIG. 2 is one of the schematic flowcharts of the man-machine dialogue method provided by the embodiment of the present application.
  • the man-machine dialogue method shown in FIG. 2 may include steps S210 to S250.
  • Step S210 acquiring the first language input sentence of the current conversation round.
  • the input sentence in the first language is a sentence input by the user in a low-resource language as the query text.
  • the computer device 10 can collect the query voice spoken by the user in a low-resource language by controlling the external sound pickup device, and perform text conversion on the query voice collected by the sound pickup device to obtain the corresponding The first language input sentence; the computer device 10 can also provide the user with a text input interface, so that the user can input the first language input sentence of the current session round in the text input interface by himself.
  • Step S220 according to the first language dialogue content of the historical conversation round and the second language dialogue content that has a mutual translation relationship with the first language dialogue content, translate the first language input sentence of the current session round to obtain the current session round. the second language input sentence.
  • the dialogue content in the first language includes input sentences in the first language and output sentences in the first language corresponding to the conversation round
  • the dialogue content in the second language includes the input sentences in the second language corresponding to the conversation round and the output sentences in the first language.
  • the output sentences in the second language wherein the input sentences in the first language and the input sentences in the second language in the same conversation round have a mutual translation relationship, and the output sentences in the first language and the second language in the same conversation round have a mutual translation relationship, so
  • the first language is a low-resource language
  • the second language is a high-resource language.
  • the computer device 10 After acquiring the first language input sentence of the current session round, the computer device 10 will combine the first language dialogue content and the second language dialogue content of the historical session rounds before the current session round, and analyze the current session round.
  • the first language input sentence is translated, so as to fully consider the context information and word expression during the multi-round human-machine conversation in the first language, and the context information and word expression during the multi-round human-machine conversation in the second language. , so that the translated input sentence of the second language of the current session can fully match the dialogue content of the first language and the dialogue content of the second language of the historical session in terms of words.
  • the computer device 10 may store a first sentence translation model for performing sentence translation from a first language to a second language in combination with historical dialogue content.
  • the first sentence translation model can be obtained by training parallel corpus between the second language and the first language in multiple rounds of continuous conversations in the open field, and each parallel corpus required for training the first sentence translation model can be obtained by using the corpus length.
  • Sequence expression [..., L t-2 , H t-2 , L t-1 , H t-1 , L t , H t ], where H is used to represent the second language sentence, L is used to represent the first language Language sentences, the H with the same subscript belongs to the translation parallel corpus corresponding to L, the first language dialogue operation of the same session round includes the input operation of the first language input sentence and the output operation of the first language output sentence under the session round , and the dialogue operation of the second language in the same session round also includes the input operation of the second language input sentence and the output operation of the second language output sentence under the session round.
  • L t is used to represent the first language output sentence of the t/2th session round
  • L t-1 is used to represent the first language input sentence of the t/2th session round
  • H t is used to represent the second language output sentence of the t/2th session round
  • H t-1 is used to represent the second language input sentence of the t/2th session round
  • H t-2 is used to represent the ( t-2)/2 session rounds of the second language output sentence
  • L t-2 is used to represent the first language output sentence of the (t-2)/2th session round
  • t is an odd number
  • L t is used to represent the first language input sentence of the (t+1)/2th session round
  • L t-1 is used to represent the first language output sentence of the (t-1)/2th session round
  • H t is used to represent the second language input sentence of the (t+1)/2th session round
  • H t-1 is used to represent the second language output sentence of the (t-1)/2th session round
  • H t-2 is used to represent
  • each sentence in the above-mentioned long corpus sequence can be expressed by a sentence sequence composed of words and punctuation in the first language or the second language.
  • the sentence sequence corresponding to the sentence "Hello, John” can be expressed as [" "you", “good”, ",", "John”]
  • the sentence sequence includes 4 sequence elements, each sequence element is used to represent the character composition of the corresponding sentence, and each sequence element can be used for the first sequence element.
  • the corpus-length sequence of each parallel corpus used for training the first sentence translation model needs to satisfy a first preset number of sequence element tokens (Tokens), and each sequence element token corresponds to carrying a sequence element content,
  • the first preset number may be 256.
  • the total number of sequence elements is less than the first preset number, a specific number of blank sequence elements need to be padded on the left side of the long sequence, and then each sequence element in the long sequence that has been complemented is filled from left to right Perform sequence splicing on the right to obtain the final long corpus sequence with the parallel corpus.
  • the specific number is the difference between the first preset number and the total number of sequence elements.
  • sentence truncation can be performed directly on the left side of the long sequence, so that the number of sequence elements in the long sequence is equal to the first preset number, and ensure that Each sequence element of the above-mentioned cut-off sentence is located on the right side of the long sequence, and then the sequence elements in the long sequence after the truncated sentence are sequenced from left to right to obtain the final long corpus sequence with the parallel corpus.
  • the dialogue content of the first language and the second language of the consecutive session rounds before the current session round should be selected.
  • the language dialogue content is combined with the first language input sentences of the current session round to form a first long sequence to be translated according to the sequence of the session round, and the first long sequence to be translated is input into the first sentence translation model,
  • the first sentence translation model fully considers the respective expressions of the historical dialogue content in the first language and the second language, correspondingly translates the second language input that matches the first language input sentence of the current conversation round in terms of words. statement.
  • the first long sequence to be translated including the first language input sentence of the current session round can be expressed as: [..., L k-2 , H k-2 , L k-1 , H k-1 , L k ], where the value of k is an odd number, and L k is used to represent the first language input sentence of the current session round.
  • the current session round is the (k+1)/2th session round
  • H k is the Can be used to represent the second language input sentence of the current conversation round obtained by the translation of the first sentence translation model
  • the first long sequence to be translated has the same expression as the long sequence of corpus used to train the first sentence translation model format, and must satisfy the first preset number of sequence element tokens.
  • the first long sequence of the current session round can be filled in sequence from right to left in a blank long sequence and in reverse order according to the sequence of the session round.
  • the sequence remains the same. Then, it is determined whether the number of filled sequence elements of the long sequence is less than the first preset number.
  • the number of filled sequence elements is less than the first preset number, a specific number of blank sequence elements need to be filled on the left side of the long sequence, and then each sequence element in the completed long sequence is filled from left to right Sequence splicing is performed on the right to obtain the final first long sequence to be translated, wherein the specific number is the difference between the first preset number and the number of filled sequence elements.
  • the specific number is the difference between the first preset number and the number of filled sequence elements.
  • the filling operation of sequence elements needs to be timely truncated on the left side of the long sequence, so that the maximum number of valid sequence elements of the long sequence is equal to the first
  • a preset number is set, and then each sequence element in the truncated long sequence is sequenced from left to right to obtain the final first long sequence to be translated.
  • Step S230 calling the pre-stored multi-round conversation generation model to parse the second language input sentence of the current session round according to the second language dialogue content of the historical session round, and generate the second language output sentence of the current session round.
  • the multi-round conversation generation model corresponds to the second language.
  • the computer device 10 obtains a second language input sentence that has a mutual translation relationship with the first language input sentence of the current session round
  • the second language dialogue content of the consecutive historical session rounds before the current session round can be selected
  • the second language input sentences of the current session round are spliced into a long session sequence according to the sequence of the session round, and the long session sequence is input into the multi-round session generation model, and the multi-round session generation model is based on the historical
  • the second language dialogue expression situation generates a corresponding second language output sentence for the second language input sentence of the current session round, wherein the second language dialogue operation in the same session round also includes the second language input sentence under the session round.
  • the conversation length sequence of the second language input sentence including the current conversation round can be expressed as: [..., H k-2 , H k-1 , H k ], where H k is used to represent the current conversation round
  • H k is used to represent the current conversation round
  • the second language input sentence, the current conversation round is the (k+1)/2th conversation round, and H k+1 can be used to represent the multi-round conversation generation model corresponding to the second language.
  • the second language output sentence of the current session round, k is an odd number.
  • the session length sequence input into the multi-round session generation model needs to satisfy a second preset number of sequence element tokens, and the second preset number may be 256.
  • the sequence elements of the second language input sentence of the current session round can be filled in sequence elements,
  • the respective sequence elements of the second language output sentences and the second language input sentences of the consecutive historical session rounds before the current session round make the long sequence after filling the sequence elements consistent with the above session long sequence in expression. Then, it is determined whether the number of filled sequence elements of the long sequence is less than the second preset number.
  • the left side of the long sequence needs to be filled with the second preset number minus the number of filled sequence elements by the difference of blank sequence elements, and then the Each sequence element in the long sequence that has been complemented is spliced from left to right to obtain the current corresponding session long sequence.
  • the filling operation of sequence elements needs to be truncated on the left side of the long sequence in time, so that the maximum number of valid sequence elements of the long sequence is equal to the second A preset number is set, and then each sequence element in the truncated long sequence is sequenced from left to right to obtain the current corresponding session-length sequence.
  • the first preset number is the same as the second preset number.
  • Step S240 according to the dialogue content of the first language and the dialogue content of the second language of the historical conversation round, and the input sentence of the first language and the input sentence of the second language of the current session, output the sentence to the second language of the current session. Perform translation to obtain at least one candidate result of the first language.
  • the computer device 10 after acquiring the second language output sentence of the current session round, the computer device 10 will combine the first language dialogue content and the second language dialogue content of the historical session rounds before the current session round , and the first language input sentence and the second language input sentence of the current session round, translate the second language output sentence of the current session round, so as to fully consider the context information in the multi-round human-machine conversation process of the first language and word expression, as well as context information and word expression in the process of multi-round human-machine conversation in the second language, so that each first language candidate result of the current conversation round translated can fully match the word choice.
  • the dialogue content of the first language of the historical session round matches the input sentence of the first language of the current session round.
  • each candidate result of the first language of the current session round has a translation relationship with the corresponding output sentence of the second language. .
  • the computer device 10 may store a second sentence translation model for performing sentence translation from the second language to the first language in combination with the historical dialogue content.
  • the second sentence translation model can be obtained by training the parallel corpus between the first language and the second language of multiple rounds of continuous conversations in the open field, and each parallel corpus required for training the second sentence translation model can also use corpus.
  • the two-sentence translation model can realize the sentence translation function from the second language to the first language in combination with the historical dialogue content.
  • the long corpus sequence of each parallel corpus required to train the second sentence translation model needs to satisfy a third preset number of sequence element tokens.
  • the respective sequence elements of the dialogue content in one language and the dialogue content in the second language make the expression form of the long sequence consistent with the above-mentioned long corpus sequence. Then, the total number of sequence elements of the valid sentences filled in the long sequence is counted to determine whether the total number of sequence elements is less than the third preset number.
  • the third preset number a specific number of blank sequence elements need to be padded on the left side of the long sequence, and then each sequence element in the long sequence that has been complemented is filled from left to right Perform sequence splicing on the right to obtain the final long corpus sequence with the parallel corpus.
  • the specific number is the difference between the third preset number and the total number of sequence elements.
  • statement truncation can be performed directly on the left side of the long sequence, so that the number of sequence elements in the long sequence is equal to the third preset number, and ensure that Each sequence element of the above-mentioned cut-off sentence is located on the right side of the long sequence, and then the sequence elements in the long sequence after the truncated sentence are sequenced from left to right to obtain the final long corpus sequence with the parallel corpus.
  • the second sentence translation model when the above-mentioned second sentence translation model needs to be used to translate the second language output sentence of the current session round, the first language dialogue content and the second language dialogue content of the consecutive session rounds before the current session round should be selected. , splicing the first language input sentence, the second language input sentence and the second language output sentence of the current session round into a second long sequence to be translated according to the sequence of the session round, and input the second long sequence to be translated into
  • the second sentence translation model fully considers the content of the dialogue in the historical session and the input sentences of the current session in the first language and the second language, respectively, and the corresponding translation is in At least one candidate result in the first language that matches the output sentence in the second language of the current session round in terms of words.
  • the second long sequence to be translated including the second language input sentence of the current session round can be expressed as: [..., H k-2 , L k-2 , H k-1 , L k-1 , H k , L k , H k+1 ], where H k+1 is used to represent the second language output sentence of the current session round, at this time the current session round is the (k+1)/2th session round, (L k+1 ') i can be used to represent the i-th first language candidate result of the current session round translated by the second sentence translation model, where i is greater than or equal to 1.
  • the second long sequence to be translated and the long sequence of corpus used for training the second sentence translation model have the same expression format, and both must satisfy a third preset number of sequence element tokens.
  • the second long sequence of the current session round can be sequentially filled in from right to left in a blank long sequence and in reverse order according to the sequence of the session round.
  • the sequence element of the language output sentence, the sequence element of the first language input sentence of the current session round, the sequence element of the second language input sentence of the current session round, the first language of the consecutive historical session rounds before the current session round The respective sequence elements of the dialogue content and the dialogue content of the second language, so that the long sequence after filling the sequence elements is consistent with the above-mentioned long sequence of corpus in terms of expression. Then, it is determined whether the number of filled sequence elements of the long sequence is less than a third preset number.
  • the number of filled sequence elements is less than the third preset number, a certain number of blank sequence elements need to be filled and filled on the left side of the long sequence, and then each sequence element in the long sequence that has been filled is filled from left to right Sequence splicing is performed on the right to obtain the final second long sequence to be translated, wherein the specific number is the difference between the third preset number and the number of filled sequence elements.
  • the filling operation of sequence elements needs to be truncated on the left side of the long sequence in time, so that the maximum number of valid sequence elements of the long sequence is equal to the third A preset number is set, and then the sequence elements in the truncated long sequence are spliced from left to right to obtain the final second long sequence to be translated.
  • Step S250 Determine and output the first language output sentence of the current conversation round from the at least one first language candidate result.
  • the computer device 10 may select a specific rule or randomly from the candidate result of the at least one first language
  • a first language candidate result is used as the first language output sentence of the current session round to ensure that the final output of the first language output sentence of the current session round is the same as the first language input sentence of the current session round and the historical session round.
  • the dialogue content of the first language has an excellent degree of cohesion in terms of semantics and context, so the more mature multi-round conversation generation model of the second language (high-resource language) is borrowed for the first language (low-resource language). Truly correct and smooth multi-round human-machine session operation.
  • the present application can implement the above-mentioned steps S210 to S250, while using the multi-round conversation generation model of the high-resource language to realize the multi-round human-machine conversation of the low-resource language, and at the same time combine the specific situation of the existing dialogue content to the low-resource language.
  • the resulting dialogue content in low-resource languages has an excellent degree of cohesion in terms of semantics and context, so that high-resource languages can be borrowed from low-resource languages.
  • the multi-turn conversation generation model of the truly achieves correct and smooth multi-turn human-machine conversation operation.
  • step S250 may include sub-step S251 and sub-step S252, so as to further improve the quality of sentence reply in the process of human-computer conversation for the low-resource language (first language) through sub-step S251 and sub-step S252.
  • Sub-step S251 for each first language candidate result, call the pre-stored coherence evaluation model to calculate the first language candidate result according to the first language dialogue content of the historical conversation round and the first language input sentence of the current conversation round. express coherence.
  • the computer device 10 after the computer device 10 obtains a first language candidate result of the current conversation round, it can select the first language conversation content of the consecutive historical conversation rounds before the current conversation round, and the current conversation round
  • the first language input sentence of the current session is spliced into a long sequence to be evaluated according to the sequence of the session round, and the long sequence to be evaluated is input into the coherent evaluation model.
  • the coherence evaluation model calculates the semantic and contextual expressive coherence of the first language candidate result of the current session round according to the historical first language dialogue expressions and the first language input sentences of the current session round, and obtains The expressive coherence of the first language candidate result.
  • the long sequence to be evaluated including the candidate results of the first language of the current session round can be expressed as: [..., L k-2 , L k-1 , L k , (L k+1 ') i ], where (L k+1 ') i represents the i-th first language candidate result of the current session round, where i is greater than or equal to 1, L k is used to represent the first language input sentence of the current session round, at this time the current session The round is the (k+1)/2th session round, and L k+1 is used to represent the first language output sentence finally determined by the current session round, where the value of k is an odd number.
  • the long sequence to be evaluated that is input to the continuous evaluation model needs to satisfy a fourth preset number of sequence element tokens, and the fourth preset number may be 256.
  • the sequence elements of the first language candidate result of the current session round can be filled in sequence elements of the first language candidate result of the current session round from right to left in a blank long sequence and in reverse order according to the sequence of the session round.
  • the sequence elements of the input sentence in the first language of the current session round, and the sequence elements of the dialogue content of the first language of the consecutive historical session rounds before the current session round so that the long sequence after filling the sequence elements is expressed in the same way as The above long sequences to be evaluated remain consistent. Then, it is determined whether the number of filled sequence elements of the long sequence is less than the fourth preset number.
  • the left side of the long sequence needs to be filled with blank sequence elements by the difference obtained by subtracting the number of filled sequence elements from the fourth preset number, and then the Each sequence element in the long sequence that has been complemented is spliced from left to right to obtain the current corresponding long sequence to be evaluated.
  • the filling operation of sequence elements needs to be truncated on the left side of the long sequence in time, so that the maximum number of valid sequence elements of the long sequence is equal to the fourth preset number A preset number is set, and then the sequence elements in the truncated long sequence are spliced from left to right to obtain the current corresponding long sequence to be evaluated.
  • Sub-step S252 select the first language candidate result with the largest expression coherence, and output it as the first language output sentence of the current conversation round.
  • the computer device 10 after the computer device 10 calculates the corresponding expression coherence degree for each first language candidate result of the current conversation round by invoking the coherence evaluation model, it can select the expression coherence degree with the largest expression coherence degree.
  • the first language candidate result is output as the first language output sentence of the current session round, that is, the content with the largest expressive coherence in (L k+1 ') i is selected as L k+1 to ensure the final output of the current.
  • the content cohesion and textual expression between the first language output sentence of the conversation round and the first language input sentence are the most natural, which improves the quality of sentence reply in the process of low-resource language human-computer conversation.
  • the present application can ensure the most natural content connection and text expression between the output sentence of the first language of the current session round and the input sentence of the first language finally output by executing the above sub-step S251 and sub-step S252, and further improve the Sentence response quality during human-computer conversations in low-resource languages.
  • the embodiment of the application provides a coherence evaluation model by The training scheme achieves the foregoing object, and the specific implementation process of the training scheme of the coherent evaluation model will be described in detail below.
  • FIG. 4 is a second schematic flowchart of a method for man-machine dialogue provided by an embodiment of the present application.
  • the man-machine dialogue method shown in FIG. 4 may include steps S260 to S280, so as to train through the steps S260 to S280 to train the expression coherence of the sentences in the first language man-machine dialogue process.
  • Step S260 acquiring a plurality of valid conversation material samples in the first language, wherein each valid conversation material sample includes dialogue contents of the first language in multiple consecutive conversation rounds.
  • each valid conversation corpus sample can be expressed by a long sequence of training corpus, and each long sequence of training corpus corresponds to the conversation content of the first language including multiple consecutive conversation rounds, wherein different valid conversation corpus samples The corresponding number of valid session rounds can be different.
  • the training corpus long sequence and the to-be-evaluated long sequence have the same expression format, and the training corpus long sequence needs to satisfy a fourth preset number of sequence element tokens.
  • the first language output sentence or the first language input sentence of a certain conversation round in the valid conversation material sample should be selected as the cutoff sentence, and then in a long blank sequence from the right Fill in the sequence elements of the cut-off sentence and the sequence elements of the dialogue content of the first language generated before the cut-off sentence in turn from the left and in reverse order according to the sequence of the conversation round, so that the long sequence after filling the sequence elements is expressed in the same way as the sequence elements.
  • the above long sequences to be evaluated remain consistent. Then, it is determined whether the number of filled sequence elements of the long sequence is less than the fourth preset number.
  • the left side of the long sequence needs to be filled with blank sequence elements by the difference obtained by subtracting the number of filled sequence elements from the fourth preset number, and then the Each sequence element in the completed long sequence is spliced from left to right to obtain a long sequence of training corpus corresponding to the valid conversation corpus sample.
  • the filling operation of sequence elements needs to be truncated on the left side of the long sequence in time, so that the maximum number of valid sequence elements of the long sequence is equal to the fourth preset number
  • a preset number is set, and then each sequence element in the truncated long sequence is sequenced from left to right to obtain a long sequence of training corpus corresponding to the valid conversation corpus sample.
  • Step S270 for each valid conversation material sample, construct a negative conversation material sample corresponding to the valid conversation material sample.
  • the negative conversation material sample is used as a negative sample of the valid conversation material sample, wherein the negative conversation material sample and the valid conversation material sample corresponding to each other are only in the last conversation round. There are differences in the output sentences in the first language.
  • the negative conversation material sample and the above-mentioned valid conversation material sample can be expressed in the same sequence expression format, and both must satisfy a fourth preset number of sequence element tokens, At this time, the negative corpus long sequence corresponding to the negative conversation corpus sample is compared with the corresponding training corpus long sequence, and the negative corpus long sequence and the training corpus long sequence are output sentences in the first language of the last conversation round. There is a difference in the content of the corresponding sequence element.
  • step S270 is a schematic flowchart of the sub-steps included in step S270 in FIG. 4 .
  • the step S270 may include sub-steps S271 to S276.
  • Sub-step S271 extracting the target first language output sentence corresponding to the last conversation round from the valid conversation material samples.
  • Sub-step S272 translate the target first language output sentence to obtain a corresponding second language expression sentence.
  • Sub-step S273 translate the expression sentences in the second language to obtain the corresponding expression sentences in the first language.
  • Sub-step S274 Calculate the minimum edit distance between the target output sentence in the first language and the expression sentence in the first language.
  • Sub-step S275 compare the calculated minimum edit distance with a preset distance threshold, and determine the reverse first language output sentence matching the target first language output sentence according to the comparison result.
  • Sub-step S276 using the negative first language output sentence to replace the target first language output sentence in the valid conversation material sample to obtain a negative conversation material sample.
  • the above-mentioned first sentence translation model when translating the target first language output sentence into the second language expression sentence, the above-mentioned first sentence translation model can be used, or a general machine translation model can be used; when translating the second language When the expression sentence is translated into the first language expression sentence, the above-mentioned second sentence translation model can be used, or a general machine translation model can be used.
  • the target first language output sentence and the first language output sentence can be obtained by calculating the Jaccard similarity between the target first language output sentence and the first language expression sentence.
  • a language expresses the minimum edit distance between sentences.
  • the step of determining, according to the comparison result, the output sentence of the reverse first language that matches the output sentence of the target first language includes:
  • the first language expression sentence is used as the reverse first language output sentence
  • the present application can construct a negative conversation material sample corresponding to the valid conversation material sample for each valid conversation material sample by executing the above sub-steps S271 to S276.
  • step S280 the initial classification model is trained by using a plurality of the obtained valid conversation material samples and a plurality of the negative conversation material samples to obtain a coherent evaluation model.
  • the present application can ensure that the trained coherence evaluation model can effectively evaluate the coherence of sentence reply expressions in the process of human-machine dialogue in the first language by performing the above steps S260 to S280.
  • the application implements the aforementioned functions by dividing the human-machine dialogue device 100 into functional modules.
  • the specific composition of the man-machine dialogue device 100 provided by the present application will be described below accordingly.
  • the human-machine dialogue device 100 may include an input sentence acquisition module 110 , an input sentence translation module 120 , an output sentence generation module 130 , an output sentence translation module 140 and an output sentence reply module 150 .
  • the input sentence obtaining module 110 is configured to obtain the first language input sentence of the current session round.
  • the input sentence translation module 120 is configured to, according to the first language dialogue content of the historical conversation round and the second language dialogue content that has a mutual translation relationship with the first language dialogue content, to the first language of the current conversation round
  • the input sentence is translated to obtain the second language input sentence of the current session round, wherein the first language dialogue content includes the first language input sentence and the first language output sentence corresponding to the session round, and the second language dialogue content It includes the second language input sentence and the second language output sentence corresponding to the conversation round.
  • the output sentence generation module 130 is used to call the pre-stored multi-round conversation generation model to parse the second language input sentence of the current conversation round according to the second language dialogue content of the historical conversation round, and generate the current conversation round The output sentence in the second language, wherein the multi-round conversation generation module is obtained based on the multi-round conversation material training in the second language.
  • the output sentence translation module 140 is configured to, according to the first language dialogue content and the second language dialogue content of the historical conversation round, and the first language input sentence and the second language input sentence of the current conversation round, to The second language output sentence of the current session round is translated, and at least one first language candidate result is obtained.
  • the output sentence replying module 150 is configured to determine the first language output sentence of the current conversation round from the at least one first language candidate result for outputting.
  • FIG. 7 is a schematic diagram of the composition of the output sentence replying module 150 in FIG. 6 .
  • the output sentence reply module 150 may include a sentence coherence calculation sub-module 151 and a sentence selection output sub-module 152 .
  • the sentence coherence calculation sub-module 151 is used for calling the pre-stored coherence evaluation model for each first language candidate result, according to the first language dialogue content of the historical conversation round and the first language input sentence of the current conversation round, to calculate The expressive coherence of the first language candidate result.
  • the sentence selection and output sub-module 152 is configured to select the first language candidate result with the greatest expressive coherence, and output it as the first language output sentence of the current conversation round.
  • FIG. 8 is the second schematic diagram of the composition of the human-machine dialogue apparatus 100 provided by the embodiment of the present application.
  • the human-machine dialogue device 100 may further include a valid sample acquisition module 160 , a negative sample construction module 170 and a classification model training module 180 .
  • the valid sample acquisition module 160 is configured to acquire a plurality of valid conversation material samples in the first language, wherein each valid conversation material sample includes dialogue contents of the first language in multiple consecutive conversation rounds.
  • the negative sample construction module 170 is configured to construct, for each valid conversation material sample, a negative conversation material sample corresponding to the valid conversation material sample.
  • the classification model training module 180 is used for training an initial classification model by using the obtained multiple valid conversation material samples and a plurality of negative conversation material samples to obtain the coherence evaluation model.
  • FIG. 9 is a schematic diagram of the composition of the reverse sample construction module 170 in FIG. 8 .
  • the negative sample construction module 170 may include a target sentence extraction submodule 171, a target sentence translation submodule 172, an expression sentence translation submodule 173, an edit distance calculation submodule 174, a negative sentence determination submodule 175 and The target statement replaces the submodule 176 .
  • the target sentence extraction sub-module 171 is configured to extract the target first language output sentence corresponding to the last conversation round from the valid conversation material samples.
  • the target sentence translation sub-module 172 is configured to translate the target first language output sentence to obtain a corresponding second language expression sentence.
  • the expression sentence translation sub-module 173 is used for translating the second language expression sentence to obtain the corresponding first language expression sentence.
  • the edit distance calculation submodule 174 is configured to calculate the minimum edit distance between the target first language output sentence and the first language expression sentence.
  • the negative sentence determination submodule 175 is configured to compare the calculated minimum edit distance with a preset distance threshold, and determine a negative first language output sentence matching the target first language output sentence according to the comparison result.
  • the target sentence replacement sub-module 176 is configured to use the negative first language output sentence to replace the target first language output sentence in the valid conversation material sample to obtain the negative conversation material sample.
  • the negative sentence determination sub-module 175 determines the mode of the negative first language output sentence matching the target first language output sentence according to the comparison result, including:
  • the first language expression sentence is used as the reverse first language output sentence
  • man-machine dialogue device 100 provided by the embodiment of the present application are the same as the aforementioned man-machine dialogue method.
  • the part not mentioned in this embodiment is not mentioned. , please refer to the above description of the man-machine dialogue method.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.
  • the functions are implemented in the form of software function modules and sold or used as independent products, they may be stored in a readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a readable storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned readable storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other various programs that can store program codes medium.
  • a man-machine dialogue method, device, computer equipment and readable storage medium provided by the present application, after the present application obtains the first language input sentence of the current session round, the The first language dialogue content of the current session and the second language dialogue content that has a mutual translation relationship with the first language dialogue content, translate the first language input sentence of the current session round to obtain the corresponding second language input sentence, and then based on the first language input sentence
  • the multi-round conversation generation model of the second language parses the second language input sentence of the current session round according to the second language dialogue content of the historical session round to generate the corresponding second language output sentence, and then generates the corresponding second language output sentence according to the second language dialogue content of the historical session round.
  • the first language output sentence of the current conversation round is determined from at least one first language candidate result for output, so that the multi-round human-machine conversation of low-resource languages can be realized by borrowing the multi-round conversation generation model of high-resource languages.
  • the inter-translation operation between the low-resource language sentences and the high-resource language sentences is realized, and the cohesion of the multi-round human-machine conversation content in the low-resource language is improved in terms of semantics and context.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)

Abstract

一种人机对话方法、装置、计算机设备及可读存储介质,涉及人机对话技术领域。在获取到当前会话轮次的第一语种输入语句后,会根据历史会话轮次的第一语种对话内容及第二语种对话内容,将第一语种输入语句翻译为第二语种输入语句,而后基于第二语种的多轮会话生成模型根据第二语种对话内容解析第二语种输入语句生成第二语种输出语句,再根据第一、第二语种对话内容以及第一、第二语种输入语句,对第二语种输出语句进行翻译,最终确定当前会话轮次的第一语种输出语句进行输出,从而得以在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,提升低资源语种多轮人机会话在语义及语境方面的表达衔接契合度。

Description

人机对话方法、装置、计算机设备及可读存储介质
相关申请的交叉引用
本申请要求于2020年12月29日提交中国专利局的申请号为202011591934.9、名称为“人机对话方法、装置、计算机设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人机对话技术领域,具体而言,涉及一种人机对话方法、装置、计算机设备及可读存储介质。
背景技术
随着科学技术的不断发展,人机对话技术在聊天机器人、手机助手、智能客服、语音导航等场景下得到了广泛应用,其中确保多轮人机会话正确且顺畅地实现便是人机对话技术所需达到的重要效果。但就实际而言,不同语种的开放领域多轮会话语料的丰富程度各自不同,高资源语种(例如,英文)的开放领域多轮会话语料极为丰富,但低资源语种(例如,中文、日文、韩文等)的开放领域多轮会话语料却相对贫乏,由此导致高资源语种所对应的多轮会话生成模型必定相较于低资源语种的多轮会话生成模型更为成熟,也更能实现正确且顺畅的多轮人机对话。在此情况下,如何针对低资源语种实现正确且顺畅地多轮人机会话,便是一项亟需解决的技术问题。
申请内容
有鉴于此,本申请的目的包括提供一种人机对话方法、装置、计算机设备及可读存储介质,能够在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,提升低资源语种的多轮人机会话内容在语义及语境方面的表达衔接契合度。
为了实现上述目的,本申请实施例采用的技术方案如下:
第一方面,本申请提供一种人机对话方法,所述方法包括:
获取当前会话轮次的第一语种输入语句;
根据历史会话轮次的第一语种对话内容及与所述第一语种对话内容存在互译关系的第二语种对话内容,对所述当前会话轮次的第一语种输入语句进行翻译,得到当前会话轮次的第二语种输入语句,其中所述第一语种对话内容包括对应会话轮次的第一语种 输入语句及第一语种输出语句,所述第二语种对话内容包括对应会话轮次的第二语种输入语句及第二语种输出语句;
调用预存的多轮会话生成模型根据所述历史会话轮次的第二语种对话内容对所述当前会话轮次的第二语种输入语句进行解析,生成当前会话轮次的第二语种输出语句,其中所述多轮会话生成模块基于第二语种的多轮会话语料训练得到;
根据所述历史会话轮次的第一语种对话内容及第二语种对话内容,以及所述当前会话轮次的第一语种输入语句和第二语种输入语句,对所述当前会话轮次的第二语种输出语句进行翻译,得到至少一个第一语种候选结果;
从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出。
在可选的实施方式中,所述从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出的步骤,包括:
针对每个第一语种候选结果,调用预存的连贯评估模型根据所述历史会话轮次的第一语种对话内容及当前会话轮次的第一语种输入语句,计算该第一语种候选结果的表达连贯度;
选取表达连贯度最大的第一语种候选结果,作为所述当前会话轮次的第一语种输出语句进行输出。
在可选的实施方式中,所述方法还包括:
获取多个使用第一语种的有效会话语料样本,其中每个有效会话语料样本包括多个连续会话轮次的第一语种对话内容;
针对每个有效会话语料样本,构建与所述有效会话语料样本对应的反面会话语料样本;
采用得到的多个所述有效会话语料样本及多个所述反面会话语料样本对初始分类模型进行训练,得到所述连贯评估模型。
在可选的实施方式中,所述构建与所述有效会话语料样本对应的反面会话语料样本的步骤,包括:
从所述有效会话语料样本中提取最后一个会话轮次所对应的目标第一语种输出语 句;
对所述目标第一语种输出语句进行翻译,得到对应的第二语种表达语句;
对所述第二语种表达语句进行翻译,得到对应的第一语种表达语句;
计算所述目标第一语种输出语句与所述第一语种表达语句之间的最小编辑距离;
将计算出的所述最小编辑距离与预设距离阈值进行比较,并根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句;
采用所述反面第一语种输出语句对所述有效会话语料样本中的所述目标第一语种输出语句进行替换,得到所述反面会话语料样本。
在可选的实施方式中,所述根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句的步骤,包括:
若所述比较结果为所述最小编辑距离大于所述预设距离阈值,则将所述第一语种表达语句作为所述反面第一语种输出语句;
若所述比较结果为所述最小编辑距离等于所述预设距离阈值,则对所述第一语种表达语句中的至少一个词语进行同义词替换,得到所述反面第一语种输出语句。
第二方面,本申请提供一种人机对话装置,所述装置包括:
输入语句获取模块,用于获取当前会话轮次的第一语种输入语句;
输入语句翻译模块,用于根据历史会话轮次的第一语种对话内容及与所述第一语种对话内容存在互译关系的第二语种对话内容,对所述当前会话轮次的第一语种输入语句进行翻译,得到当前会话轮次的第二语种输入语句,其中所述第一语种对话内容包括对应会话轮次的第一语种输入语句及第一语种输出语句,所述第二语种对话内容包括对应会话轮次的第二语种输入语句及第二语种输出语句;
输出语句生成模块,用于调用预存的多轮会话生成模型根据所述历史会话轮次的第二语种对话内容对所述当前会话轮次的第二语种输入语句进行解析,生成当前会话轮次的第二语种输出语句,其中所述多轮会话生成模块基于第二语种的多轮会话语料训练得到;
输出语句翻译模块,用于根据所述历史会话轮次的第一语种对话内容及第二语种对 话内容,以及所述当前会话轮次的第一语种输入语句和第二语种输入语句,对所述当前会话轮次的第二语种输出语句进行翻译,得到至少一个第一语种候选结果;
输出语句回复模块,用于从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出。
在可选的实施方式中,所述输出语句回复模块包括:
语句连贯计算子模块,用于针对每个第一语种候选结果,调用预存的连贯评估模型根据所述历史会话轮次的第一语种对话内容及当前会话轮次的第一语种输入语句,计算该第一语种候选结果的表达连贯度;
语句选取输出子模块,用于选取表达连贯度最大的第一语种候选结果,作为所述当前会话轮次的第一语种输出语句进行输出。
在可选的实施方式中,所述装置还包括:
有效样本获取模块,用于获取多个使用第一语种的有效会话语料样本,其中每个有效会话语料样本包括多个连续会话轮次的第一语种对话内容;
反面样本构建模块,用于针对每个有效会话语料样本,构建与所述有效会话语料样本对应的反面会话语料样本;
分类模型训练模块,用于采用得到的多个所述有效会话语料样本及多个所述反面会话语料样本对初始分类模型进行训练,得到所述连贯评估模型。
在可选的实施方式中,所述反面样本构建模块包括:
目标语句提取子模块,用于从所述有效会话语料样本中提取最后一个会话轮次所对应的目标第一语种输出语句;
目标语句翻译子模块,用于对所述目标第一语种输出语句进行翻译,得到对应的第二语种表达语句;
表达语句翻译子模块,用于对所述第二语种表达语句进行翻译,得到对应的第一语种表达语句;
编辑距离计算子模块,用于计算所述目标第一语种输出语句与所述第一语种表达语句之间的最小编辑距离;
反面语句确定子模块,用于将计算出的所述最小编辑距离与预设距离阈值进行比较,并根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句;
目标语句替换子模块,用于采用所述反面第一语种输出语句对所述有效会话语料样本中的所述目标第一语种输出语句进行替换,得到所述反面会话语料样本。
在可选的实施方式中,所述反面语句确定子模块根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句的方式,包括:
若所述比较结果为所述最小编辑距离大于所述预设距离阈值,则将所述第一语种表达语句作为所述反面第一语种输出语句;
若所述比较结果为所述最小编辑距离等于所述预设距离阈值,则对所述第一语种表达语句中的至少一个词语进行同义词替换,得到所述反面第一语种输出语句。
第三方面,本申请提供一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器存储有能够被所述处理器执行的计算机程序,所述处理器可执行所述计算机程序,实现前述实施方式中任意一项所述的人机对话方法。
第四方面,本申请提供一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现前述实施方式中任意一项所述的人机对话方法。
本申请实施例的有益效果包括如下内容:
本申请在获取到当前会话轮次的第一语种输入语句后,会根据历史会话轮次的第一语种对话内容及与第一语种对话内容存在互译关系的第二语种对话内容,对当前会话轮次的第一语种输入语句进行翻译得到对应的第二语种输入语句,而后基于第二语种的多轮会话生成模型根据历史会话轮次的第二语种对话内容对当前会话轮次的第二语种输入语句解析生成对应的第二语种输出语句,再根据历史会话轮次的第一语种对话内容及第二语种对话内容,以及当前会话轮次的第一语种输入语句和第二语种输入语句,对当前会话轮次的第二语种输出语句进行翻译得到至少一个第一语种候选结果,最后从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出,从而得以在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,通过结合已有对话内容的具体情况实现低资源语种语句与高资源语种语句之间的互译操作,提升低资源语种的多轮人机会话内容在语义及语境方面的表达衔接契合度。
为使本申请的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请实施例提供的计算机设备的组成示意图;
图2为本申请实施例提供的人机对话方法的流程示意图之一;
图3为图2中的步骤S250包括的子步骤的流程示意图;
图4为本申请实施例提供的人机对话方法的流程示意图之二;
图5为图4中的步骤S270包括的子步骤的流程示意图;
图6为本申请实施例提供的人机对话装置的组成示意图之一;
图7为图6中的输出语句回复模块的组成示意图;
图8为本申请实施例提供的人机对话装置的组成示意图之二;
图9为图8中的反面样本构建模块的组成示意图。
附图:10-计算机设备;11-存储器;12-处理器;13-通信单元;100-人机对话装置;110-输入语句获取模块;120-输入语句翻译模块;130-输出语句生成模块;140-输出语句翻译模块;150-输出语句回复模块;151-语句连贯计算子模块;152-语句选取输出子模块;160-有效样本获取模块;170-反面样本构建模块;180-分类模型训练模块;171-目标语句提取子模块;172-目标语句翻译子模块;173-表达语句翻译子模块;174-编辑距离计算子模块;175-反面语句确定子模块;176-目标语句替换子模块。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请 实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
在本申请的描述中,需要理解的是,术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。
现有的针对低资源语种实现多轮人机会话的一种方案是利用机器翻译技术,先将低资源语种的问询文本直接翻译为高资源语种的问询文本,而后使用通过高资源语种的开放领域多轮会话语料训练出的多轮会话生成模型针对翻译出的高资源语种的问询文本,生成对应的高资源语种的回复文本,接着再将生成的高资源语种的回复文本直译成低资源语种的回复文本进行输出,从而完成一个会话轮次的低资源语种人机对话操作,此时一个会话轮次的低资源语种人机对话操作即可包括一次低资源语种的问询文本的输入操作,以及一次低资源语种的回复文本的输出操作。
但需要注意的是,这种方案在具体实现过程中所执行的低资源语种语句与高资源语种语句之间的互译操作,容易使低资源语种的问询文本与回复文本在语义及语境上存在明显突兀的状况,整体的表达衔接契合度不高,通常会出现词不达意的情况。
例如,假设存在一个中文问询文本“你喜欢游泳吗?”,通过机器翻译模型,它会被直译为英文问询文本“Do you like swimming?”,并通过英文的多轮会话生成模型处 理,然后假设得到英文回复文本“Yes,I do.”,此时再通过机器翻译模型对英文回复文本“Yes,I do.”进行直译,此时可能会直接得到中文回复文本“是的,我做”或“是的,我愿意”,从而导致中文问询文本与中文回复文本之间的表达衔接契合度不高。
假设存在一个中文问询文本“你讨厌小孩吗?”,通过机器翻译模型,它会被直译为英文问询文本“Do you hate kids?”,并通过英文的多轮会话生成模型处理,然后假设得到英文回复文本“Yes,I hate kids.”,此时再通过机器翻译模型对英文回复文本“Yes,I hate kids.”进行直译,此时可能会直接得到中文回复文本“是的,我恨小孩。”或“是的,我恨小山羊。”(“kid”是个多义词,可以被翻译为“小孩”或“小山羊”),前者在意义上并没有什么大问题,因为“讨厌”和“恨”是近义词,但问询文本和回复文本用词不一致也会导致会话不自然,而后者则使问询文本和回复文本完全失去关联了,从而导致低资源语种的问询文本与回复文本在语义及语境上存在明显突兀的状况,整体的表达衔接契合度不高,出现了词不达意的情况。
在此情况下,为确保低资源语种能够借用高资源语种的多轮会话生成模型真正实现正确且顺畅的多轮人机会话操作,本申请实施例通过提供一种人机对话方法、装置、计算机设备及可读存储介质实现前述功能。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互结合。
请参照图1,图1是本申请实施例提供的计算机设备10的组成示意图。在本申请实施例中,所述计算机设备10在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,将已有对话内容的具体情况结合到低资源语种语句与高资源语种语句之间的互译过程中,使最终得到的低资源语种对话内容在语义及语境方面具有极好的表达衔接契合度,从而针对低资源语种能够借用高资源语种的多轮会话生成模型真正实现正确且顺畅的多轮人机会话操作。其中,所述计算机设备10可以是,但不限于,智能手机、平板电脑、个人计算机、服务器、对话机器人等。
在本实施例中,所述计算机设备10可以包括存储器11、处理器12、通信单元13及人机对话装置100。其中,所述存储器11、所述处理器12及所述通信单元13各个元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,所述存储器11、所述处理器12及所述通信单元13这些元件相互之间可通过一条或多条通讯总线或信号 线实现电性连接。
在本实施例中,所述存储器11可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,所述存储器11用于存储计算机程序,所述处理器12在接收到执行指令后,可相应地执行所述计算机程序。
其中,所述存储器11还用于存储多轮会话生成模型,所述多轮会话生成模型基于高资源语种所对应的开放领域多轮会话语料训练得到,由此所述多轮会话生成模型可以针对该资源语种的输入语句生成对应的作为回复内容的输出语句。
在本实施例中,所述处理器12可以是一种具有信号的处理能力的集成电路芯片。所述处理器12可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)及网络处理器(Network Processor,NP)、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件中的至少一种。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。
在本实施例中,所述通信单元13用于通过网络建立所述计算机设备10与其他电子设备之间的通信连接,并通过所述网络收发数据,其中所述网络包括有线通信网络及无线通信网络。例如,所述计算机设备10能够通过所述通信单元13从用户使用的客户端设备处获取低资源语种的问询文本,并在针对该低资源语种的问询文本生成对应的回复文本后,通过所述通信单元13向该用户使用的客户端设备输出所述回复文本;所述计算机设备10能够通过所述通信单元13将自身生成的低资源语种的回复文本传输给音频播放设备进行语音播放。
在本实施例中,所述人机对话装置100包括至少一个能够以软件或固件的形式存储于所述存储器11中或固化在所述计算机设备10的操作系统中的软件功能模块。所述处理器12可用于执行所述存储器11存储的可执行模块,例如所述人机对话装置100所包括的软件功能模块及计算机程序等。所述计算机设备10通过所述人机对话装置100在 借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,将已有对话内容的具体情况结合到低资源语种语句与高资源语种语句之间的互译过程中,提升低资源语种的多轮人机会话内容在语义及语境方面的表达衔接契合度。
可以理解的是,图1所示的框图仅为所述计算机设备10的一种组成示意图,所述计算机设备10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。图1中所示的各组件可以采用硬件、软件或其组合实现。
在本申请中,为确保所述计算机设备10能够在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,提升低资源语种的多轮人机会话内容在语义及语境方面的表达衔接契合度,本申请实施例通过提供人机对话方法实现前述目的。下面对本申请提供的人机对话方法进行详细描述。
可选地,请参照图2,图2是本申请实施例提供的人机对话方法的流程示意图之一。在本申请实施例中,图2所示的人机对话方法可以包括步骤S210~步骤S250。
步骤S210,获取当前会话轮次的第一语种输入语句。
在本实施例中,所述第一语种输入语句为用户采用低资源语种输入的作为问询文本的语句。其中,所述计算机设备10可通过控制外接的拾音设备对用户采用低资源语种说出的问询语音进行采集,并对所述拾音设备采集到的问询语音进行文本转换,得到对应的第一语种输入语句;所述计算机设备10也可通过向用户提供文本输入界面,使用户自行在该文本输入界面中输入当前会话轮次的第一语种输入语句。
步骤S220,根据历史会话轮次的第一语种对话内容及与第一语种对话内容存在互译关系的第二语种对话内容,对当前会话轮次的第一语种输入语句进行翻译,得到当前会话轮次的第二语种输入语句。
在本实施例中,所述第一语种对话内容包括对应会话轮次的第一语种输入语句及第一语种输出语句,所述第二语种对话内容包括对应会话轮次的第二语种输入语句及第二语种输出语句,其中同一会话轮次的第一语种输入语句与第二语种输入语句存在互译关系,同一会话轮次的第一语种输出语句与第二语种输出语句存在互译关系,所述第一语种属于低资源语种,所述第二语种属于高资源语种。
所述计算机设备10在获取到当前会话轮次的第一语种输入语句后,会结合当前会 话轮次以前的历史会话轮次的第一语种对话内容及第二语种对话内容,对当前会话轮次的第一语种输入语句进行翻译,从而充分考虑第一语种的多轮人机会话过程中的上下文信息及用词表达,以及第二语种的多轮人机会话过程中的上下文信息及用词表达,使翻译出来的当前会话轮次的第二语种输入语句在用词方面能够充分地与历史会话轮次的第一语种对话内容及第二语种对话内容相契合。
在本实施例的一种实施方式中,所述计算机设备10内可存储有用于结合历史对话内容从第一语种到第二语种进行语句翻译的第一语句翻译模型。所述第一语句翻译模型可采用开放领域多轮连续会话的第二语种与第一语种之间的平行语料训练得到,训练所述第一语句翻译模型所需的每条平行语料可采用语料长序列进行表达:[…,L t-2,H t-2,L t-1,H t-1,L t,H t],其中H用于表示第二语种语句,L用于表示第一语种语句,下脚标相同的H属于对应L的翻译平行语料,同一会话轮次的第一语种对话操作包括该会话轮次下的第一语种输入语句的输入操作以及第一语种输出语句的输出操作,而同一会话轮次的第二语种对话操作也包括该会话轮次下的第二语种输入语句的输入操作以及第二语种输出语句的输出操作。其中若t为偶数,则L t用于表示第t/2个会话轮次的第一语种输出语句,L t-1用于表示第t/2个会话轮次的第一语种输入语句,H t用于表示第t/2个会话轮次的第二语种输出语句,H t-1用于表示第t/2个会话轮次的第二语种输入语句,H t-2用于表示第(t-2)/2个会话轮次的第二语种输出语句,L t-2用于表示第(t-2)/2个会话轮次的第一语种输出语句;若t为奇数,则L t用于表示第(t+1)/2个会话轮次的第一语种输入语句,L t-1用于表示第(t-1)/2个会话轮次的第一语种输出语句,H t用于表示第(t+1)/2个会话轮次的第二语种输入语句,H t-1用于表示第(t-1)/2个会话轮次的第二语种输出语句,H t-2用于表示第(t-1)/2个会话轮次的第二语种输入语句,L t-2用于表示第(t-1)/2个会话轮次的第一语种输入语句。
其中,每个语句在上述语料长序列中可以采用由第一语种或第二语种的文字及标点组成的语句序列进行表达,例如语句“你好,John”所对应的语句序列可表达为[“你”,“好”,“,”,“John”],此时该语句序列包括4个序列元素,每个序列元素用于表示对应语句的字符组成,每个所述序列元素可对第一语种或第二语种所对应的文字、词语及标点中的一种进行表示。
在对第一语句翻译模型的训练过程中,针对训练第一语句翻译模型所采用的每条平行语料的语料长序列来说,可将该语料长序列中的子序列[…,L t-2,H t-2,L t-1,H t-1,L t] 作为模型输入而将子序列[H t]作为模型输出,来对第一语句翻译模型进行训练,以确保训练出的第一语句翻译模型能够结合历史对话内容实现第一语种到第二语种的语句翻译功能。
其中,训练所述第一语句翻译模型所采用的每条平行语料的语料长序列需要满足第一预设数目个序列元素令牌(Token),每个序列元素令牌对应承载一个序列元素内容,所述第一预设数目可以是256。
因此,针对训练第一语句翻译模型的每条平行语料,需优先选取该平行语料中的某个会话轮次的第二语种输出语句或第二语种输入语句作为截止语句,接着在一个空白长序列中从右往左地且按照会话轮次时序倒序地依次填入截止语句的序列元素、与截止语句对应的第一语种输出语句或第一语种输入语句的序列元素,以及截止语句之前会话轮次的第一语种对话内容和第二语种对话内容各自的序列元素,使该长序列的表达形式与上述语料长序列保持一致。然后,对填入该长序列的有效语句的序列元素总数进行统计,判断序列元素总数是否为小于所述第一预设数目。
若所述序列元素总数小于所述第一预设数目,则需在该长序列左侧填充补位特定数目个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的与该平行语料的语料长序列。所述特定数目即为所述第一预设数目与所述序列元素总数之间的差值。
若所述序列元素总数大于或等于所述第一预设数目,则可直接在该长序列左侧进行语句截断,使该长序列中的序列元素数目等于所述第一预设数目,并确保上述截止语句的各项序列元素位于该长序列右侧,而后将语句截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的与该平行语料的语料长序列。
在此情况下,当需要使用上述第一语句翻译模型对当前会话轮次的第一语种输入语句进行翻译时,应选取当前会话轮次以前的连续会话轮次的第一语种对话内容及第二语种对话内容,配合当前会话轮次的第一语种输入语句按照会话轮次时序拼接成一个第一待翻译长序列,并将该第一待翻译长序列输入到所述第一语句翻译模型中,由该第一语句翻译模型充分考虑历史对话内容在第一语种及第二语种上各自的表达情况,对应翻译出在用词方面与当前会话轮次的第一语种输入语句匹配的第二语种输入语句。此时,包括当前会话轮次的第一语种输入语句的第一待翻译长序列可表达为:[…,L k-2,H k-2,L k-1,H k-1,L k],其中k的数值为奇数,L k用于表示当前会话轮次的第一语种输入语句, 此时当前会话轮次即为第(k+1)/2个会话轮次,而H k即可用于表示通过第一语句翻译模型翻译得到的当前会话轮次的第二语种输入语句,所述第一待翻译长序列与训练所述第一语句翻译模型所采用的语料长序列具有相同的表达格式,且均需满足第一预设数目个序列元素令牌。
其中,需要注意的是,所述第一待翻译长序列在具体构建时,可在一个空白长序列中从右往左地且按照会话轮次时序倒序地依次填入当前会话轮次的第一语种输入语句的序列元素、当前会话轮次以前的历史会话轮次的第一语种对话内容和第二语种对话内容各自的序列元素,使填充序列元素后的长序列在表达形式上与上述语料长序列保持一致。而后,判断该长序列的已填序列元素数目是否小于第一预设数目。
若已填序列元素数目小于所述第一预设数目,则需在该长序列左侧填充补位特定数目个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的第一待翻译长序列,其中所述特定数目即为所述第一预设数目与所述已填序列元素数目之间的差值。例如,假设当前会话轮次为第1个会话轮次,则当前会话轮次以前的历史会话轮次为零,此时包括当前会话轮次的第一语种输入语句的第一待翻译长序列左端需补位第一预设数目减去第1个会话轮次中第一语种输入语句所涉及的序列元素数目个空白序列元素。
若已填序列元素数目等于或大于所述第一预设数目,则需在该长序列左侧及时地截断序列元素的填入操作,使该长序列的最大有效序列元素数目等于所述第一预设数目,而后将截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的第一待翻译长序列。
步骤S230,调用预存的多轮会话生成模型根据历史会话轮次的第二语种对话内容对当前会话轮次的第二语种输入语句进行解析,生成当前会话轮次的第二语种输出语句。
在本实施例中,所述多轮会话生成模型与第二语种相对应。当所述计算机设备10得到与当前会话轮次的第一语种输入语句存在互译关系的第二语种输入语句后,可选取当前会话轮次以前连续的历史会话轮次的第二语种对话内容,配合当前会话轮次的第二语种输入语句按照会话轮次时序拼接成一个会话长序列,并将该会话长序列输入到所述多轮会话生成模型,由所述多轮会话生成模型根据历史第二语种对话表达情况针对当前会话轮次的第二语种输入语句生成对应的第二语种输出语句,其中同一会话轮次的第二语种对话操作也包括该会话轮次下的第二语种输入语句的输入操作以及第二语种输出 语句的输出操作。此时,包括当前会话轮次的第二语种输入语句的会话长序列可表示为:[…,H k-2,H k-1,H k],其中H k用于表示当前会话轮次的第二语种输入语句,此时当前会话轮次即为第(k+1)/2个会话轮次,而H k+1即可用于表示通过与第二语种对应的多轮会话生成模型生成的当前会话轮次的第二语种输出语句,k为奇数。
其中,输入所述多轮会话生成模型的会话长序列需要满足第二预设数目个序列元素令牌,所述第二预设数目可以是256。此时,所述会话长序列在具体构建时,可在一个空白长序列中从右往左地且按照会话轮次时序倒序地依次填入当前会话轮次的第二语种输入语句的序列元素、当前会话轮次以前连续的历史会话轮次的第二语种输出语句和第二语种输入语句各自的序列元素,使填充序列元素后的长序列在表达形式上与上述会话长序列保持一致。而后,判断该长序列的已填序列元素数目是否小于第二预设数目。
若已填序列元素数目小于所述第二预设数目,则需在该长序列左侧填充补位第二预设数目减去所述已填序列元素数目所得差值个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到当前对应的会话长序列。
若已填序列元素数目等于或大于所述第二预设数目,则需在该长序列左侧及时地截断序列元素的填入操作,使该长序列的最大有效序列元素数目等于所述第二预设数目,而后将截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到当前对应的会话长序列。
在本实施例的一种实施方式中,所述第一预设数目与所述第二预设数目相同。
步骤S240,根据历史会话轮次的第一语种对话内容及第二语种对话内容,以及当前会话轮次的第一语种输入语句和第二语种输入语句,对当前会话轮次的第二语种输出语句进行翻译,得到至少一个第一语种候选结果。
在本实施例中,所述计算机设备10在获取到当前会话轮次的第二语种输出语句后,会结合当前会话轮次以前的历史会话轮次的第一语种对话内容及第二语种对话内容,以及当前会话轮次的第一语种输入语句和第二语种输入语句,对当前会话轮次的第二语种输出语句进行翻译,从而充分考虑第一语种的多轮人机会话过程中的上下文信息及用词表达,以及第二语种的多轮人机会话过程中的上下文信息及用词表达,使翻译出来的当前会话轮次的每个第一语种候选结果均能在用词方面充分地与历史会话轮次的第一语种对话内容以及当前会话轮次的第一语种输入语句相契合,此时当前会话轮次的每个第一语种候选结果均与对应第二语种输出语句存在互译关系。
在本实施例的一种实施方式中,所述计算机设备10内可存储有用于结合历史对话内容从第二语种到第一语种进行语句翻译的第二语句翻译模型。所述第二语句翻译模型可采用开放领域多轮连续会话的第一语种与第二语种之间的平行语料训练得到,训练所述第二语句翻译模型所需的每条平行语料也可采用语料长序列进行表达:[…,H t-2,L t-2,H t-1,L t-1,H t,L t],其中H用于表示第二语种语句,L用于表示第一语种语句,下脚标相同的H属于对应L的翻译平行语料,同一会话轮次的第一语种对话操作包括该会话轮次下的第一语种输入语句的输入操作以及第一语种输出语句的输出操作,而同一会话轮次的第二语种对话操作也包括该会话轮次下的第二语种输入语句的输入操作以及第二语种输出语句的输出操作。训练所述第二语句翻译模型所需的语料长序列中各子序列的具体含义,与上述训练所述第一语句翻译模型所需的语料长序列中各子序列的具体含义相同,在此就不再一一赘述了。
在对第二语句翻译模型的训练过程中,针对训练第二语句翻译模型所采用的每条平行语料的语料长序列来说,可将该语料长序列中的子序列[…,H t-2,L t-2,H t-1,L t-1,H t]作为模型输入而将子序列[L t]作为模型输出,来对第二语句翻译模型进行训练,以确保训练出的第二语句翻译模型能够结合历史对话内容实现第二语种到第一语种的语句翻译功能。
其中,训练所述第二语句翻译模型所需的每条平行语料的语料长序列需要满足第三预设数目个序列元素令牌。针对训练第二语句翻译模型的每条平行语料,需优先选取该平行语料中的某个会话轮次的第一语种输出语句或第一语种输入语句作为截止语句,接着在一个空白长序列中从右往左地且按照会话轮次时序倒序地依次填入截止语句的序列元素、与截止语句对应的第二语种输出语句或第二语种输入语句的序列元素,以及截止语句之前会话轮次的第一语种对话内容和第二语种对话内容各自的序列元素,使该长序列的表达形式与上述语料长序列保持一致。然后,对填入该长序列的有效语句的序列元素总数进行统计,判断序列元素总数是否为小于所述第三预设数目。
若所述序列元素总数小于所述第三预设数目,则需在该长序列左侧填充补位特定数目个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的与该平行语料的语料长序列。所述特定数目即为所述第三预设数目与所述序列元素总数之间的差值。
若所述序列元素总数大于或等于所述第三预设数目,则可直接在该长序列左侧进行 语句截断,使该长序列中的序列元素数目等于所述第三预设数目,并确保上述截止语句的各项序列元素位于该长序列右侧,而后将语句截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的与该平行语料的语料长序列。
因此,当需要使用上述第二语句翻译模型对当前会话轮次的第二语种输出语句进行翻译时,应选取当前会话轮次以前的连续会话轮次的第一语种对话内容及第二语种对话内容,配合当前会话轮次的第一语种输入语句、第二语种输入语句及第二语种输出语句按照会话轮次时序拼接成一个第二待翻译长序列,并将该第二待翻译长序列输入到所述第二语句翻译模型中,由该第二语句翻译模型充分考虑历史会话轮次对话内容及当前会话轮次输入语句分别在第一语种及第二语种上各自的表达情况,对应翻译出在用词方面与当前会话轮次的第二语种输出语句匹配的至少一个第一语种候选结果。此时,包括当前会话轮次的第二语种输入语句的第二待翻译长序列可表示为:[…,H k-2,L k-2,H k-1,L k-1,H k,L k,H k+1],其中H k+1用于表示当前会话轮次的第二语种输出语句,此时当前会话轮次即为第(k+1)/2个会话轮次,可用(L k+1’) i表示通过第二语句翻译模型翻译得到的当前会话轮次的第i个第一语种候选结果,其中i大于或等于1。其中,所述第二待翻译长序列与训练所述第二语句翻译模型所采用的语料长序列具有相同的表达格式,且均需满足第三预设数目个序列元素令牌。
其中,需要注意的是,所述第二待翻译长序列在具体构建时,可在一个空白长序列中从右往左地且按照会话轮次时序倒序地依次填入当前会话轮次的第二语种输出语句的序列元素、当前会话轮次的第一语种输入语句的序列元素、当前会话轮次的第二语种输入语句的序列元素、当前会话轮次以前连续的历史会话轮次的第一语种对话内容及第二语种对话内容各自的序列元素,使填充序列元素后的长序列在表达形式上与上述语料长序列保持一致。而后,判断该长序列的已填序列元素数目是否小于第三预设数目。
若已填序列元素数目小于所述第三预设数目,则需在该长序列左侧填充补位特定数目个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的第二待翻译长序列,其中所述特定数目即为所述第三预设数目与所述已填序列元素数目之间的差值。
若已填序列元素数目等于或大于所述第三预设数目,则需在该长序列左侧及时地截断序列元素的填入操作,使该长序列的最大有效序列元素数目等于所述第三预设数目,而后将截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到最终的第二待 翻译长序列。
步骤S250,从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出。
在本实施例中,当所述计算机设备10确定出当前会话轮次的至少一个第一语种候选结果后,可采用某种特定规则或随机地从所述至少一个第一语种候选结果中,选取一个第一语种候选结果作为当前会话轮次的第一语种输出语句,以确保最终输出的当前会话轮次的第一语种输出语句与当前会话轮次的第一语种输入语句及历史会话轮次的第一语种对话内容在语义及语境方面具有极好的表达衔接契合度,从而针对第一语种(低资源语种)借用第二语种(高资源语种)所具有的更成熟的多轮会话生成模型真正实现正确且顺畅的多轮人机会话操作。
由此,本申请可通过执行上述步骤S210~步骤S250,在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,将已有对话内容的具体情况结合到低资源语种语句与高资源语种语句之间的互译过程中,使最终得到的低资源语种对话内容在语义及语境方面具有极好的表达衔接契合度,从而针对低资源语种能够借用高资源语种的多轮会话生成模型真正实现正确且顺畅的多轮人机会话操作。
可选地,请参照图3,图3是图2中的步骤S250包括的子步骤的流程示意图。在本实施例中,所述步骤S250可以包括子步骤S251及子步骤S252,以通过子步骤S251及子步骤S252进一步针对低资源语种(第一语种)提升人机会话过程中的语句回复质量。
子步骤S251,针对每个第一语种候选结果,调用预存的连贯评估模型根据历史会话轮次的第一语种对话内容及当前会话轮次的第一语种输入语句,计算该第一语种候选结果的表达连贯度。
在本实施例中,当所述计算机设备10得到当前会话轮次的一个第一语种候选结果后,可选取当前会话轮次以前连续的历史会话轮次的第一语种对话内容,以及当前会话轮次的第一语种输入语句,配合当前会话轮次的第一语种候选结果按照会话轮次时序拼接成一个待评估长序列,并将该待评估长序列输入到所述连贯评估模型内,由所述连贯评估模型根据历史第一语种对话表达情况以及当前会话轮次的第一语种输入语句,对当前会话轮次的该第一语种候选结果在语义及语境方面的表达连贯度进行计算,得到该第 一语种候选结果的表达连贯度。此时,包括当前会话轮次的第一语种候选结果的待评估长序列可表示为:[…,L k-2,L k-1,L k,(L k+1’) i],其中(L k+1’) i表示当前会话轮次的第i个第一语种候选结果,其中i大于或等于1,L k用于表示当前会话轮次的第一语种输入语句,此时当前会话轮次即为第(k+1)/2个会话轮次,L k+1用于表示当前会话轮次最终确定出的第一语种输出语句,其中k的数值为奇数。
其中,输入所述连贯评估模型的待评估长序列需要满足第四预设数目个序列元素令牌,所述第四预设数目可以是256。此时,所述待评估长序列在具体构建时,可在一个空白长序列中从右往左地且按照会话轮次时序倒序地依次填入当前会话轮次的第一语种候选结果的序列元素、当前会话轮次的第一语种输入语句的序列元素以及当前会话轮次以前连续的历史会话轮次的第一语种对话内容各自的序列元素,使填充序列元素后的长序列在表达形式上与上述待评估长序列保持一致。而后,判断该长序列的已填序列元素数目是否小于第四预设数目。
若已填序列元素数目小于所述第四预设数目,则需在该长序列左侧填充补位第四预设数目减去所述已填序列元素数目所得差值个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到当前对应的待评估长序列。
若已填序列元素数目等于或大于所述第四预设数目,则需在该长序列左侧及时地截断序列元素的填入操作,使该长序列的最大有效序列元素数目等于所述第四预设数目,而后将截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到当前对应的待评估长序列。
子步骤S252,选取表达连贯度最大的第一语种候选结果,作为当前会话轮次的第一语种输出语句进行输出。
在本实施例中,当所述计算机设备10通过调用所述连贯评估模型针对当前会话轮次的每个第一语种候选结果计算出其对应的表达连贯度后,可通过选择表达连贯度最大的第一语种候选结果,作为当前会话轮次的第一语种输出语句进行输出,即在(L k+1’) i中选取表达连贯度最大的内容作为L k+1,以确保最终输出的当前会话轮次的第一语种输出语句与第一语种输入语句之间的内容衔接及文本表达最为自然,提升低资源语种人机会话过程中的语句回复质量。
由此,本申请可通过执行上述子步骤S251及子步骤S252,确保最终输出的当前会 话轮次的第一语种输出语句与第一语种输入语句之间的内容衔接及文本表达最为自然,进一步提升低资源语种人机会话过程中的语句回复质量。
在本申请中,为确保所述计算机设备10内存储的连贯评估模型能够针对第一语种人机对话过程中的语句回复表达连贯度进行有效评估,本申请实施例通过提供一种连贯评估模型的训练方案实现前述目的,下面对所述连贯评估模型的训练方案的具体实现过程进行详细描述。
可选地,请参照图4,图4是本申请实施例提供的人机对话方法的流程示意图之二。在本申请实施例中,图4所示的人机对话方法可以包括步骤S260~步骤S280,以通过步骤S260~步骤S280训练出能够针对第一语种人机对话过程中的语句回复表达连贯度进行有效评估的连贯评估模型。
步骤S260,获取多个使用第一语种的有效会话语料样本,其中每个有效会话语料样本包括多个连续会话轮次的第一语种对话内容。
在本实施例中,每个有效会话语料样本可采用训练语料长序列进行表达,每个训练语料长序列对应包括多个连续会话轮次的第一语种对话内容,其中不同有效会话语料样本所对应的有效会话轮次数目可以不同。
其中,所述训练语料长序列与所述待评估长序列具有相同的表达格式,所述训练语料长序列需满足第四预设数目个序列元素令牌。针对每个有效会话语料样本,需优先选取该有效会话语料样本中的某个会话轮次的第一语种输出语句或第一语种输入语句作为截止语句,接着在一个空白长序列中从右往左地且按照会话轮次时序倒序地,依次填入截止语句的序列元素以及所述截止语句之前产生的第一语种对话内容的序列元素,使填充序列元素后的长序列在表达形式上与上述待评估长序列保持一致。而后,判断该长序列的已填序列元素数目是否小于第四预设数目。
若已填序列元素数目小于所述第四预设数目,则需在该长序列左侧填充补位第四预设数目减去所述已填序列元素数目所得差值个空白序列元素,而后将补位完成的该长序列中的各序列元素从左到右地进行序列拼接,得到与该有效会话语料样本对应的训练语料长序列。
若已填序列元素数目等于或大于所述第四预设数目,则需在该长序列左侧及时地截断序列元素的填入操作,使该长序列的最大有效序列元素数目等于所述第四预设数目, 而后将截断后的该长序列中的各序列元素从左到右地进行序列拼接,得到与该有效会话语料样本对应的训练语料长序列。
步骤S270,针对每个有效会话语料样本,构建与有效会话语料样本对应的反面会话语料样本。
在本实施例中,所述反面会话语料样本用于充当所述有效会话语料样本的负样本,其中相互对应的反面会话语料样本与有效会话语料样本仅在最后一个会话轮次的第一语种输出语句存在差异。在本实施例的一种实施方式中,所述反面会话语料样本可与上述有效会话语料样本采用相同的序列表达格式进行表达,且均需满足第四预设数目个序列元素令牌,此时所述反面会话语料样本所对应的反面语料长序列与对应训练语料长序列相比,所述反面语料长序列与所述训练语料长序列在最后一个会话轮次的第一语种输出语句所对应的序列元素内容处存在差异。
可选地,请参照图5,图5是图4中的步骤S270包括的子步骤的流程示意图。在本实施例中,针对每个有效会话语料样本,所述步骤S270可以包括子步骤S271~子步骤S276。
子步骤S271,从有效会话语料样本中提取最后一个会话轮次所对应的目标第一语种输出语句。
子步骤S272,对目标第一语种输出语句进行翻译,得到对应的第二语种表达语句。
子步骤S273,对第二语种表达语句进行翻译,得到对应的第一语种表达语句。
子步骤S274,计算目标第一语种输出语句与第一语种表达语句之间的最小编辑距离。
子步骤S275,将计算出的最小编辑距离与预设距离阈值进行比较,并根据比较结果确定与目标第一语种输出语句匹配的反面第一语种输出语句。
子步骤S276,采用反面第一语种输出语句对有效会话语料样本中的所述目标第一语种输出语句进行替换,得到反面会话语料样本。
其中,在将所述目标第一语种输出语句翻译成所述第二语种表达语句时可采用上述的第一语句翻译模型实现,也可采用通用的机器翻译模型实现;在将所述第二语种表达语句翻译成所述第一语种表达语句时可采用上述的第二语句翻译模型实现,也可采用通 用的机器翻译模型实现。在本实施例的一种实施方式中,可通过计算目标第一语种输出语句与第一语种表达语句之间的杰卡德相似度的方式,得到所述目标第一语种输出语句与所述第一语种表达语句之间的最小编辑距离。
其中,所述根据比较结果确定与目标第一语种输出语句匹配的反面第一语种输出语句的步骤包括:
若所述比较结果为所述最小编辑距离大于所述预设距离阈值,则将所述第一语种表达语句作为所述反面第一语种输出语句;
若所述比较结果为所述最小编辑距离等于所述预设距离阈值,则对所述第一语种表达语句中的至少一个词语进行同义词替换,得到所述反面第一语种输出语句。
由此,本申请可通过执行上述子步骤S271~子步骤S276,针对每个有效会话语料样本,构建出与该有效会话语料样本对应的反面会话语料样本。
步骤S280,采用得到的多个所述有效会话语料样本及多个所述反面会话语料样本对初始分类模型进行训练,得到连贯评估模型。
由此,本申请可通过执行上述步骤S260~步骤S280,确保训练出的连贯评估模型能够针对第一语种人机对话过程中的语句回复表达连贯度进行有效评估。
在本申请中,为确保所述计算机设备10能够通过所述人机对话装置100执行上述人机对话方法,本申请通过对所述人机对话装置100进行功能模块划分的方式实现前述功能。下面对本申请提供的人机对话装置100的具体组成进行相应描述。
可选地,请参照图6,图6是本申请实施例提供的人机对话装置100的组成示意图之一。在本申请实施例中,所述人机对话装置100可以包括输入语句获取模块110、输入语句翻译模块120、输出语句生成模块130、输出语句翻译模块140及输出语句回复模块150。
输入语句获取模块110,用于获取当前会话轮次的第一语种输入语句。
输入语句翻译模块120,用于根据历史会话轮次的第一语种对话内容及与所述第一语种对话内容存在互译关系的第二语种对话内容,对所述当前会话轮次的第一语种输入语句进行翻译,得到当前会话轮次的第二语种输入语句,其中所述第一语种对话内容包括对应会话轮次的第一语种输入语句及第一语种输出语句,所述第二语种对话内容包括 对应会话轮次的第二语种输入语句及第二语种输出语句。
输出语句生成模块130,用于调用预存的多轮会话生成模型根据所述历史会话轮次的第二语种对话内容对所述当前会话轮次的第二语种输入语句进行解析,生成当前会话轮次的第二语种输出语句,其中所述多轮会话生成模块基于第二语种的多轮会话语料训练得到。
输出语句翻译模块140,用于根据所述历史会话轮次的第一语种对话内容及第二语种对话内容,以及所述当前会话轮次的第一语种输入语句和第二语种输入语句,对所述当前会话轮次的第二语种输出语句进行翻译,得到至少一个第一语种候选结果。
输出语句回复模块150,用于从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出。
可选地,请参照图7,图7是图6中的输出语句回复模块150的组成示意图。在本实施例中,所述输出语句回复模块150可以包括语句连贯计算子模块151及语句选取输出子模块152。
语句连贯计算子模块151,用于针对每个第一语种候选结果,调用预存的连贯评估模型根据所述历史会话轮次的第一语种对话内容及当前会话轮次的第一语种输入语句,计算该第一语种候选结果的表达连贯度。
语句选取输出子模块152,用于选取表达连贯度最大的第一语种候选结果,作为所述当前会话轮次的第一语种输出语句进行输出。
可选地,请参照图8,图8是本申请实施例提供的人机对话装置100的组成示意图之二。在本申请实施例中,所述人机对话装置100还可以包括有效样本获取模块160、反面样本构建模块170及分类模型训练模块180。
有效样本获取模块160,用于获取多个使用第一语种的有效会话语料样本,其中每个有效会话语料样本包括多个连续会话轮次的第一语种对话内容。
反面样本构建模块170,用于针对每个有效会话语料样本,构建与所述有效会话语料样本对应的反面会话语料样本。
分类模型训练模块180,用于采用得到的多个所述有效会话语料样本及多个所述反面会话语料样本对初始分类模型进行训练,得到所述连贯评估模型。
可选地,请参照图9,图9是图8中的反面样本构建模块170的组成示意图。在本实施例中,所述反面样本构建模块170可以包括目标语句提取子模块171、目标语句翻译子模块172、表达语句翻译子模块173、编辑距离计算子模块174、反面语句确定子模块175及目标语句替换子模块176。
目标语句提取子模块171,用于从所述有效会话语料样本中提取最后一个会话轮次所对应的目标第一语种输出语句。
目标语句翻译子模块172,用于对所述目标第一语种输出语句进行翻译,得到对应的第二语种表达语句。
表达语句翻译子模块173,用于对所述第二语种表达语句进行翻译,得到对应的第一语种表达语句。
编辑距离计算子模块174,用于计算所述目标第一语种输出语句与所述第一语种表达语句之间的最小编辑距离。
反面语句确定子模块175,用于将计算出的所述最小编辑距离与预设距离阈值进行比较,并根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句。
目标语句替换子模块176,用于采用所述反面第一语种输出语句对所述有效会话语料样本中的所述目标第一语种输出语句进行替换,得到所述反面会话语料样本。
其中,所述反面语句确定子模块175根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句的方式,包括:
若所述比较结果为所述最小编辑距离大于所述预设距离阈值,则将所述第一语种表达语句作为所述反面第一语种输出语句;
若所述比较结果为所述最小编辑距离等于所述预设距离阈值,则对所述第一语种表达语句中的至少一个词语进行同义词替换,得到所述反面第一语种输出语句。
其中,需要说明的是,本申请实施例所提供的人机对话装置100,其基本原理及产生的技术效果与前述的人机对话方法相同,为简要描述,本实施例部分未提及之处,可参考上述的针对人机对话方法的描述内容。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图 显示了根据本申请的实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个可读存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的可读存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
综上所述,在本申请提供的一种人机对话方法、装置、计算机设备及可读存储介质中,本申请在获取到当前会话轮次的第一语种输入语句后,会根据历史会话轮次的第一语种对话内容及与第一语种对话内容存在互译关系的第二语种对话内容,对当前会话轮次的第一语种输入语句进行翻译得到对应的第二语种输入语句,而后基于第二语种的多轮会话生成模型根据历史会话轮次的第二语种对话内容对当前会话轮次的第二语种输入语句解析生成对应的第二语种输出语句,再根据历史会话轮次的第一语种对话内容及第二语种对话内容,以及当前会话轮次的第一语种输入语句和第二语种输入语句,对当前会话轮次的第二语种输出语句进行翻译得到至少一个第一语种候选结果,最后从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出,从而得以在借用高资源语种的多轮会话生成模型实现低资源语种的多轮人机会话的同时,通过结合 已有对话内容的具体情况实现低资源语种语句与高资源语种语句之间的互译操作,提升低资源语种的多轮人机会话内容在语义及语境方面的表达衔接契合度。
以上所述,仅为本申请的各种实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应当以权利要求的保护范围为准。

Claims (12)

  1. 一种人机对话方法,其特征在于,所述方法包括:
    获取当前会话轮次的第一语种输入语句;
    根据历史会话轮次的第一语种对话内容及与所述第一语种对话内容存在互译关系的第二语种对话内容,对所述当前会话轮次的第一语种输入语句进行翻译,得到当前会话轮次的第二语种输入语句,其中所述第一语种对话内容包括对应会话轮次的第一语种输入语句及第一语种输出语句,所述第二语种对话内容包括对应会话轮次的第二语种输入语句及第二语种输出语句;
    调用预存的多轮会话生成模型根据所述历史会话轮次的第二语种对话内容对所述当前会话轮次的第二语种输入语句进行解析,生成当前会话轮次的第二语种输出语句,其中所述多轮会话生成模块基于第二语种的多轮会话语料训练得到;
    根据所述历史会话轮次的第一语种对话内容及第二语种对话内容,以及所述当前会话轮次的第一语种输入语句和第二语种输入语句,对所述当前会话轮次的第二语种输出语句进行翻译,得到至少一个第一语种候选结果;
    从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出。
  2. 根据权利要求1所述的方法,其特征在于,所述从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出的步骤,包括:
    针对每个第一语种候选结果,调用预存的连贯评估模型根据所述历史会话轮次的第一语种对话内容及当前会话轮次的第一语种输入语句,计算该第一语种候选结果的表达连贯度;
    选取表达连贯度最大的第一语种候选结果,作为所述当前会话轮次的第一语种输出语句进行输出。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取多个使用第一语种的有效会话语料样本,其中每个有效会话语料样本包括多个连续会话轮次的第一语种对话内容;
    针对每个有效会话语料样本,构建与所述有效会话语料样本对应的反面会话语料样 本;
    采用得到的多个所述有效会话语料样本及多个所述反面会话语料样本对初始分类模型进行训练,得到所述连贯评估模型。
  4. 根据权利要求3所述的方法,其特征在于,所述构建与所述有效会话语料样本对应的反面会话语料样本的步骤,包括:
    从所述有效会话语料样本中提取最后一个会话轮次所对应的目标第一语种输出语句;
    对所述目标第一语种输出语句进行翻译,得到对应的第二语种表达语句;
    对所述第二语种表达语句进行翻译,得到对应的第一语种表达语句;
    计算所述目标第一语种输出语句与所述第一语种表达语句之间的最小编辑距离;
    将计算出的所述最小编辑距离与预设距离阈值进行比较,并根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句;
    采用所述反面第一语种输出语句对所述有效会话语料样本中的所述目标第一语种输出语句进行替换,得到所述反面会话语料样本。
  5. 根据权利要求4所述的方法,其特征在于,所述根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句的步骤,包括:
    若所述比较结果为所述最小编辑距离大于所述预设距离阈值,则将所述第一语种表达语句作为所述反面第一语种输出语句;
    若所述比较结果为所述最小编辑距离等于所述预设距离阈值,则对所述第一语种表达语句中的至少一个词语进行同义词替换,得到所述反面第一语种输出语句。
  6. 一种人机对话装置,其特征在于,所述装置包括:
    输入语句获取模块,用于获取当前会话轮次的第一语种输入语句;
    输入语句翻译模块,用于根据历史会话轮次的第一语种对话内容及与所述第一语种对话内容存在互译关系的第二语种对话内容,对所述当前会话轮次的第一语种输入语句进行翻译,得到当前会话轮次的第二语种输入语句,其中所述第一语种对话内容包括对应会话轮次的第一语种输入语句及第一语种输出语句,所述第二语种对话内容包括对应 会话轮次的第二语种输入语句及第二语种输出语句;
    输出语句生成模块,用于调用预存的多轮会话生成模型根据所述历史会话轮次的第二语种对话内容对所述当前会话轮次的第二语种输入语句进行解析,生成当前会话轮次的第二语种输出语句,其中所述多轮会话生成模块基于第二语种的多轮会话语料训练得到;
    输出语句翻译模块,用于根据所述历史会话轮次的第一语种对话内容及第二语种对话内容,以及所述当前会话轮次的第一语种输入语句和第二语种输入语句,对所述当前会话轮次的第二语种输出语句进行翻译,得到至少一个第一语种候选结果;
    输出语句回复模块,用于从至少一个第一语种候选结果中确定当前会话轮次的第一语种输出语句进行输出。
  7. 根据权利要求6所述的装置,其特征在于,所述输出语句回复模块包括:
    语句连贯计算子模块,用于针对每个第一语种候选结果,调用预存的连贯评估模型根据所述历史会话轮次的第一语种对话内容及当前会话轮次的第一语种输入语句,计算该第一语种候选结果的表达连贯度;
    语句选取输出子模块,用于选取表达连贯度最大的第一语种候选结果,作为所述当前会话轮次的第一语种输出语句进行输出。
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括:
    有效样本获取模块,用于获取多个使用第一语种的有效会话语料样本,其中每个有效会话语料样本包括多个连续会话轮次的第一语种对话内容;
    反面样本构建模块,用于针对每个有效会话语料样本,构建与所述有效会话语料样本对应的反面会话语料样本;
    分类模型训练模块,用于采用得到的多个所述有效会话语料样本及多个所述反面会话语料样本对初始分类模型进行训练,得到所述连贯评估模型。
  9. 根据权利要求8所述的装置,其特征在于,所述反面样本构建模块包括:
    目标语句提取子模块,用于从所述有效会话语料样本中提取最后一个会话轮次所对应的目标第一语种输出语句;
    目标语句翻译子模块,用于对所述目标第一语种输出语句进行翻译,得到对应的第二语种表达语句;
    表达语句翻译子模块,用于对所述第二语种表达语句进行翻译,得到对应的第一语种表达语句;
    编辑距离计算子模块,用于计算所述目标第一语种输出语句与所述第一语种表达语句之间的最小编辑距离;
    反面语句确定子模块,用于将计算出的所述最小编辑距离与预设距离阈值进行比较,并根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句;
    目标语句替换子模块,用于采用所述反面第一语种输出语句对所述有效会话语料样本中的所述目标第一语种输出语句进行替换,得到所述反面会话语料样本。
  10. 根据权利要求9所述的装置,其特征在于,所述反面语句确定子模块根据比较结果确定与所述目标第一语种输出语句匹配的反面第一语种输出语句的方式,包括:
    若所述比较结果为所述最小编辑距离大于所述预设距离阈值,则将所述第一语种表达语句作为所述反面第一语种输出语句;
    若所述比较结果为所述最小编辑距离等于所述预设距离阈值,则对所述第一语种表达语句中的至少一个词语进行同义词替换,得到所述反面第一语种输出语句。
  11. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器存储有能够被所述处理器执行的计算机程序,所述处理器可执行所述计算机程序,实现权利要求1-5中任意一项所述的人机对话方法。
  12. 一种可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现权利要求1-5中任意一项所述的人机对话方法。
PCT/CN2021/131221 2020-12-29 2021-11-17 人机对话方法、装置、计算机设备及可读存储介质 WO2022142823A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/870,813 US20220358297A1 (en) 2020-12-29 2022-07-21 Method for human-machine dialogue, computing device and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011591934.9A CN112579760B (zh) 2020-12-29 2020-12-29 人机对话方法、装置、计算机设备及可读存储介质
CN202011591934.9 2020-12-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/870,813 Continuation US20220358297A1 (en) 2020-12-29 2022-07-21 Method for human-machine dialogue, computing device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022142823A1 true WO2022142823A1 (zh) 2022-07-07

Family

ID=75143912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131221 WO2022142823A1 (zh) 2020-12-29 2021-11-17 人机对话方法、装置、计算机设备及可读存储介质

Country Status (3)

Country Link
US (1) US20220358297A1 (zh)
CN (1) CN112579760B (zh)
WO (1) WO2022142823A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579760B (zh) * 2020-12-29 2024-01-19 深圳市优必选科技股份有限公司 人机对话方法、装置、计算机设备及可读存储介质
CN116089593B (zh) * 2023-03-24 2023-06-13 齐鲁工业大学(山东省科学院) 基于时序特征筛选编码模块的多回合人机对话方法和装置
CN117725414A (zh) * 2023-12-13 2024-03-19 北京海泰方圆科技股份有限公司 训练内容生成模型方法、确定输出内容方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100041019A (ko) * 2008-10-13 2010-04-22 한국전자통신연구원 문서 번역 장치 및 그 방법
CN108228574A (zh) * 2017-12-07 2018-06-29 科大讯飞股份有限公司 文本翻译处理方法及装置
CN111460115A (zh) * 2020-03-17 2020-07-28 深圳市优必选科技股份有限公司 智能人机对话模型训练方法、模型训练装置及电子设备
CN112100354A (zh) * 2020-09-16 2020-12-18 北京奇艺世纪科技有限公司 人机对话方法、装置、设备及存储介质
CN112579760A (zh) * 2020-12-29 2021-03-30 深圳市优必选科技股份有限公司 人机对话方法、装置、计算机设备及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9269354B2 (en) * 2013-03-11 2016-02-23 Nuance Communications, Inc. Semantic re-ranking of NLU results in conversational dialogue applications
US11145291B2 (en) * 2018-01-31 2021-10-12 Microsoft Technology Licensing, Llc Training natural language system with generated dialogues
CN111104789B (zh) * 2019-11-22 2023-12-29 华中师范大学 文本评分方法、装置和系统
CN111797226B (zh) * 2020-06-30 2024-04-05 北京百度网讯科技有限公司 会议纪要的生成方法、装置、电子设备以及可读存储介质
CN112100349B (zh) * 2020-09-03 2024-03-19 深圳数联天下智能科技有限公司 一种多轮对话方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100041019A (ko) * 2008-10-13 2010-04-22 한국전자통신연구원 문서 번역 장치 및 그 방법
CN108228574A (zh) * 2017-12-07 2018-06-29 科大讯飞股份有限公司 文本翻译处理方法及装置
CN111460115A (zh) * 2020-03-17 2020-07-28 深圳市优必选科技股份有限公司 智能人机对话模型训练方法、模型训练装置及电子设备
CN112100354A (zh) * 2020-09-16 2020-12-18 北京奇艺世纪科技有限公司 人机对话方法、装置、设备及存储介质
CN112579760A (zh) * 2020-12-29 2021-03-30 深圳市优必选科技股份有限公司 人机对话方法、装置、计算机设备及可读存储介质

Also Published As

Publication number Publication date
CN112579760B (zh) 2024-01-19
US20220358297A1 (en) 2022-11-10
CN112579760A (zh) 2021-03-30

Similar Documents

Publication Publication Date Title
WO2022142823A1 (zh) 人机对话方法、装置、计算机设备及可读存储介质
US10242667B2 (en) Natural language generation in a spoken dialogue system
JP2021018797A (ja) 対話の交互方法、装置、コンピュータ可読記憶媒体、及びプログラム
KR102576505B1 (ko) 디코딩 네트워크 구축 방법, 음성 인식 방법, 디바이스 및 장치, 및 저장 매체
TW201935273A (zh) 語句的使用者意圖識別方法和裝置
WO2017166650A1 (zh) 语音识别方法及装置
JP6884947B2 (ja) 対話システム及びそのためのコンピュータプログラム
WO2018153273A1 (zh) 语义解析方法、装置及存储介质
US11586689B2 (en) Electronic apparatus and controlling method thereof
CN115309877B (zh) 对话生成方法、对话模型训练方法及装置
JP2023535709A (ja) 言語表現モデルシステム、事前訓練方法、装置、機器及び媒体
CN107844470B (zh) 一种语音数据处理方法及其设备
US11636272B2 (en) Hybrid natural language understanding
CN112307188B (zh) 对话生成方法、系统、电子设备和可读存储介质
US20220076677A1 (en) Voice interaction method, device, and storage medium
CN113641807A (zh) 对话推荐模型的训练方法、装置、设备和存储介质
CN105390137A (zh) 响应生成方法、响应生成装置和响应生成程序
CN112711943B (zh) 一种维吾尔文语种识别方法、装置及存储介质
CN109002498B (zh) 人机对话方法、装置、设备及存储介质
CN114783405B (zh) 一种语音合成方法、装置、电子设备及存储介质
CN115620726A (zh) 语音文本生成方法、语音文本生成模型的训练方法、装置
US20220164545A1 (en) Dialog action estimation device, dialog action estimation method, dialog action estimation model learning device, and program
CN115346520A (zh) 语音识别的方法、装置、电子设备和介质
CN111353035B (zh) 人机对话方法、装置、可读存储介质及电子设备
CN109065016B (zh) 语音合成方法、装置、电子设备及非暂态计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21913566

Country of ref document: EP

Kind code of ref document: A1