WO2018153130A1 - 一种翻译方法及装置 - Google Patents

一种翻译方法及装置 Download PDF

Info

Publication number
WO2018153130A1
WO2018153130A1 PCT/CN2017/112384 CN2017112384W WO2018153130A1 WO 2018153130 A1 WO2018153130 A1 WO 2018153130A1 CN 2017112384 W CN2017112384 W CN 2017112384W WO 2018153130 A1 WO2018153130 A1 WO 2018153130A1
Authority
WO
WIPO (PCT)
Prior art keywords
named entity
entity
statement
named
language
Prior art date
Application number
PCT/CN2017/112384
Other languages
English (en)
French (fr)
Inventor
涂兆鹏
王龙跃
杜金华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17897771.6A priority Critical patent/EP3547163A4/en
Publication of WO2018153130A1 publication Critical patent/WO2018153130A1/zh
Priority to US16/452,439 priority patent/US11244108B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • Embodiments of the present invention relate to the field of machine translation, and in particular, to a translation method and apparatus.
  • Machine translation refers to the process of using a computer to convert a statement of one language into a statement of another language, and according to the translation method, machine translation can be roughly divided into statistical-based machine translation and analysis-based machine translation.
  • the analysis-based machine translation refers to morpheme, syntax and semantic analysis of the first language sentence, and converts the analyzed structure into a second language, and then generates a corresponding second language sentence.
  • the embodiment of the invention provides a translation method and device, which solves the problem that the translation effect is poor and the accuracy is low when translating into the dialogue domain in the prior art.
  • a translation method comprising: obtaining a statement to be translated; wherein the statement to be translated is a statement in a first language specified in the specified conversation task; determining a first named entity set in the statement to be translated, and An entity type of each first named entity in a named entity set; wherein the first named entity set includes at least one first named entity; determining the second language according to the first named entity set and the entity type of each first named entity a second set of named entities; wherein the second named entity set includes at least one second named entity, and the at least one second named entity corresponds to the at least one first named entity; determining a source semantic template of the statement to be translated, and Obtaining a target semantic template corresponding to the source semantic template in the semantic template correspondence corresponding to the dialog task; wherein the semantic template correspondence is a correspondence between the semantic template of the first language representation and the semantic template of the second language representation; a second named entity set and a target semantic template to determine a target translation statement; wherein, the target
  • determining, according to the first named entity set and the entity type of each first named entity, the second named entity set of the second language representation including: for the first named entity set
  • Each of the first named entities obtains, according to the entity type of the first named entity, a second named entity corresponding to the first named entity from the corresponding entity corresponding to the specified dialog task, thereby obtaining a second named entity set;
  • the named entity correspondence is a correspondence between a named entity represented by the first language and a named entity represented by the second language.
  • the second named entity corresponding to the first named entity is obtained from the corresponding entity corresponding to the specified dialog task, so that the accuracy of the first named entity translation can be improved, and the first named entity is avoided.
  • Sexuality leads to translation errors or ambiguity.
  • the method further includes: determining, according to the training corpus corresponding to the specified dialog task, a named entity correspondence corresponding to the specified dialog task; wherein the training corpus includes at least the training of the first language representation The corpus and the training corpus of the second language representation corresponding to the training corpus represented by the first language.
  • the training corpus corresponding to the specified dialog task by training the training corpus corresponding to the specified dialog task, the corresponding relationship of the named entity corresponding to the specified dialog task is obtained, thereby ensuring the validity and accuracy of translation between the named entities in the corresponding relationship of the named entity. Further, when determining the second named entity according to the named entity correspondence, the accuracy of the translation of the second named entity may be improved.
  • the method further includes: determining, according to a training corpus corresponding to the specified dialog task, a semantic template correspondence corresponding to the specified conversation task; wherein the training corpus includes at least training of the first language representation The corpus and the training corpus of the second language representation corresponding to the training corpus represented by the first language.
  • the method further includes: displaying first semantic information, where the first semantic information includes a first named entity set and an entity type corresponding to each first named entity; and/or, The second semantic information is displayed, and the second semantic information includes a second named entity set and an entity type corresponding to each second named entity.
  • first semantic information includes a first named entity set and an entity type corresponding to each first named entity
  • second semantic information is displayed, and the second semantic information includes a second named entity set and an entity type corresponding to each second named entity.
  • the method further includes: if the modification instruction is received, acquiring the modified statement of the statement to be translated, and Translate the modified statement.
  • the dialog participant when the dialog participant triggers the modification operation, the dialog participant can modify the statement to be translated to make the semantic meaning of the expression clearer and the grammatical component more complete, so that the machine translation system re-acquires the modified statement to be translated.
  • the statement and the translation of the modified statement can further ensure the accuracy of the translation and ensure the smooth progress of the specified dialogue task.
  • the method further includes: if it is determined that the first named entity in the first named entity set does not exist in the corresponding entity corresponding to the specified dialog task, according to the first named entity Entity type, obtaining a third named entity corresponding to the second language representation corresponding to the first named entity; updating the corresponding relationship of the named entity corresponding to the specified dialog task according to the entity type, the first named entity, and the third named entity of the first named entity .
  • the third named entity that is manually input may be received, or the built-in dictionary interface is used.
  • the translation method obtains the third named entity represented by the second language corresponding to the first named entity, and updates the corresponding relationship of the named entity corresponding to the specified dialog task, so that the subsequent translation can be directly used to improve the efficiency of subsequent translation. .
  • a translation apparatus comprising: an obtaining unit, configured to acquire a statement to be translated; wherein the statement to be translated is a statement indicated by a first language in a specified conversation task; and the first determining unit is configured to determine a first named entity set in the translation statement, and an entity type of each first named entity in the first named entity set; wherein the first named entity set includes at least one first named entity; the first determining unit is further configured to Determining, by the first named entity set and the entity type of each first named entity, a second named entity set of the second language representation; wherein the second named entity set includes at least one second named entity, and at least one second named entity Corresponding to the at least one first named entity; the second determining unit is configured to determine a source semantic template of the statement to be translated, and obtain a target semantic template corresponding to the source semantic template from the semantic template correspondence corresponding to the specified dialog task;
  • the semantic template correspondence is a pair between a semantic template of the first language representation and a semantic template
  • the first determining unit is specifically configured to: for each first named entity in the first named entity set, according to the entity type of the first named entity, corresponding to the specified dialog task Obtaining a second named entity corresponding to the first named entity in the named entity correspondence relationship, thereby obtaining a second named entity set; wherein the named entity correspondence relationship is between the named entity represented by the first language and the named entity represented by the second language Correspondence.
  • the apparatus further includes: a training unit, configured to determine, according to the training corpus corresponding to the specified dialog task, a corresponding entity corresponding to the specified dialog task; wherein the training corpus includes at least A training corpus of a language representation and a training corpus of a second language representation corresponding to the training corpus represented by the first language.
  • a training unit configured to determine, according to the training corpus corresponding to the specified dialog task, a corresponding entity corresponding to the specified dialog task; wherein the training corpus includes at least A training corpus of a language representation and a training corpus of a second language representation corresponding to the training corpus represented by the first language.
  • the apparatus further includes: a training unit, configured to determine, according to the training corpus corresponding to the specified dialog task, a semantic template correspondence corresponding to the specified dialog task; wherein the training corpus includes at least A training corpus of a language representation and a training corpus of a second language representation corresponding to the training corpus represented by the first language.
  • a training unit configured to determine, according to the training corpus corresponding to the specified dialog task, a semantic template correspondence corresponding to the specified dialog task; wherein the training corpus includes at least A training corpus of a language representation and a training corpus of a second language representation corresponding to the training corpus represented by the first language.
  • the apparatus further includes: a display unit, configured to display first semantic information, where the first semantic information includes a first named entity set and an entity type corresponding to each first named entity And/or, the display unit is further configured to display the second semantic information, where the second semantic information includes a second named entity set and an entity type corresponding to each second named entity.
  • the acquiring unit is further configured to: if the translation device receives the modification instruction, obtain the modified statement of the statement to be translated, and translate the modified statement.
  • the acquiring unit is further configured to: if it is determined that the first named entity in the first named entity set does not exist in the corresponding entity corresponding to the specified dialog task, according to the first named entity Entity type, obtaining a third named entity corresponding to the second language representation corresponding to the first named entity; the apparatus further comprising: an updating unit, configured to: according to the entity type of the first named entity, the first named entity, and the third named entity Updates the corresponding relationship of the named entity corresponding to the specified dialog task.
  • a translation apparatus comprising a memory, a processor, a bus, and a communication interface, wherein the memory stores code and data, the processor and the memory are connected by a bus, and the processor runs the code in the memory to cause the translation apparatus
  • the translation method provided by the above first aspect or any of the possible implementations of the first aspect is performed.
  • Yet another aspect of the present application provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • Yet another aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods described in the various aspects above.
  • FIG. 1 is a schematic structural diagram of a machine translation system according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of another machine translation system according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a translation method according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a translation example according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of still another translation method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of training of a training corpus according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a translation apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of another translation apparatus according to an embodiment of the present invention.
  • a specified dialog task refers to a task that is oriented to a conversational translation task, which may be any one of the preset tasks.
  • the preset task is for the dialogue party to use different languages for dialogue, and the content of the conversation has a certain range.
  • preset tasks can include hotel reservations, online shopping, health consultations, restaurant reservations, airline reservations, and video conferences for international business.
  • the training corpus refers to the pair of statements corresponding to different languages used for training.
  • the statement pair refers to the statement in the same language expressed in different languages.
  • the pair of sentences corresponding to Chinese and English for example, I am Amy in Room 515). Harris, I am Amy Hams from Room 515).
  • the training corpus may be a training corpus corresponding to two languages, or may be a training corpus corresponding to three or more languages. Specifying the training corpus under the dialogue task means that the content or semantic meaning of the training corpus is a pair of statements corresponding to different languages of the specified conversation task.
  • Named entity refers to the key information in the statement.
  • the key information can be the specific number, person name, and Place name or organization name, etc. For example, if a statement is: I am Amy Harris in Room 515, the named entities in the statement can include: 515, Amy Harris.
  • the entity type of the named entity that represents the attribute of the named entity.
  • the entity type of the named entity is closely related to the statement of the named entity and the specified dialog task in which the statement is located, and the meaning of the named entity in its statement can be further indicated by the entity type of the named entity.
  • the entity type of the named entity "515" may be a room number
  • the entity type of the named entity "Amy Harris" may be a customer name.
  • the named entity correspondence refers to the correspondence between the same named entities represented by different languages under the same entity type.
  • the corresponding relationship between the named entities under the hotel reservation task can be as shown in the following Table 1.
  • Table 1 the correspondence between the named entity and the named entity in English is named as an example.
  • a semantic template is a statement template used to represent a particular semantics.
  • the content of the specific named entity in the semantic template may be missing, or may be replaced by the entity type of the named entity.
  • the semantic template corresponding to I am Amy Hams from Room 515 can be: I am ⁇ Customer Name> from ⁇ Room NO.>.
  • the semantic template correspondence relationship refers to the correspondence between semantic templates of the same semantics expressed by different languages.
  • the semantic template correspondence relationship under the hotel reservation task can be as shown in the following Table 2.
  • Table 2 the correspondence relationship between the semantic template correspondence and the semantic template of the Chinese representation and the semantic template of the English representation is described as an example.
  • the semantic information is used to represent the semantic meaning of the statement by the preset entity type and the named entity included in the statement.
  • the semantic information may include information of a relationship between at least one preset entity type and an entity type of the named entity corresponding to at least one named entity included in a statement.
  • the first semantic information refers to information represented by the first language
  • the second semantic information refers to information represented by the second language.
  • the first language is Chinese
  • the second language is English
  • the first semantic information can be as shown in Table 4 below.
  • Tables 3 and 4 below use the dialogue statement under the hotel booking task to include: “hello, this is Hilton Hotel", "I want to book a double room on September 11" as an example.
  • the named entity corresponding to the first statement includes: Hilton Hotel
  • the named entity corresponding to the second statement includes: September 11 and the double room.
  • the entity type corresponding to the Hilton Hotel is Hotel Name.
  • the entity type corresponding to September 11 is the date
  • the entity type corresponding to the double room is the room type.
  • FIG. 1 is a schematic structural diagram of a machine translation system according to an embodiment of the present invention.
  • the machine translation system includes a first determining unit 101, a second determining unit 102, and a translation unit 103.
  • the first determining unit 101 is configured to identify and translate the named entity in the statement. Specifically, in the training process of the named entity, the training corpus under any specified dialog task may be used to identify the named entity, define the entity type, and store the named entity with the same entity type and the corresponding named entity in the corresponding entity correspondence. . For example, the first determining unit 101 may identify the named entity through supervised learning, domain adaptation, and rule methods, and define the entity type of the named entity. When the statement is translated, the first determining unit 101 may be configured to identify the named entity in the statement, and the entity type of the named entity, and translate each named entity into a named entity corresponding to the target language according to the named entity correspondence. In addition, in order to avoid that the named entity identified according to the training corpus cannot cover all the named entities, the first determining unit 101 may further be provided with a dictionary interface to add a user-defined named entity by using dictionary-based named entity recognition.
  • the second determining unit 102 is configured to perform semantic analysis on the statement to obtain a semantic template.
  • the form of the semantic template representation is various, and can be a logical expression or a sentence sentence type.
  • the second determining unit 102 may construct a semantic template by using the supervised learning or rules, according to the context content, using the named entity identified by the first determining unit 101, and the entity type.
  • the translation unit 103 is configured to translate the statement to be translated into a target translation sentence.
  • the statement to be translated may be a statement expressed in the first language, for example, the sentence to be translated is a sentence expressed in Chinese.
  • the target translation statement may be a statement in a second language that is identical to the semantics of the statement to be translated, for example, the target translation sentence is a sentence expressed in English.
  • the machine translation system can also present the target translation statement to the user, for example, the target translation statement can be displayed to the user, or played to the user by voice, video, or the like.
  • FIG. 2 is a schematic structural diagram of another machine translation system according to an embodiment of the present invention.
  • the machine translation system translates between two languages.
  • the translation corresponding to the first language is referred to as a source.
  • the translation of the second language is referred to as the target end, and the machine translation system may include a source first determining unit 111, a source second determining unit 112, a target first determining unit 121, and a target second determining.
  • Unit 122 and translation unit 130 The source first determining unit 111 and the source second determining unit 112 are configured to identify and analyze the sentences and corpora of the first language representation to obtain the named entity and the semantic template set of the first language representation.
  • the target end first determining unit 121 and the target end second determining unit 122 are configured to identify and analyze the sentences and corpora of the second language representation to obtain the named entity and the semantic template represented by the second language.
  • the translation unit 130 is used to perform statement translation between two languages.
  • the basic principle of the embodiment of the present invention is that, by training the naming entity and the entity type of the training corpus based on the dialog task, the corresponding relationship of the named entity is obtained, and the semantic template recognition training is performed, and the semantic template correspondence relationship is obtained.
  • the named statement is identified and translated, and the source semantic template in the statement to be translated is identified and its corresponding target semantic template is obtained, and then the translated named entity is filled according to the corresponding entity type.
  • the target semantic template to complete the translation of the statement to be translated.
  • the naming entity and the source semantic template corresponding to the translated statement are translated by naming the entity correspondence and the semantic template correspondence, which can enhance the accuracy of the translation of the named entity and the source semantic template, and at the same time, by displaying the semantic information, the dialogue can be Both parties fully understand each other's semantics, ensure the accuracy of the translated target translation statement, and improve the success rate of the task completion.
  • FIG. 3 is a flowchart of a translation method according to an embodiment of the present invention. Referring to FIG. 3, the method may include the following steps.
  • Step 201 Acquire a statement to be translated, and the statement to be translated is a statement indicated by the first language in the specified conversation task.
  • the specified dialog task refers to a certain task in the dialogue-oriented translation, and the designated dialog task can be any one of the preset tasks.
  • the preset task is for the dialogue party to use different languages for dialogue, and the content of the conversation has a certain range.
  • scheduled tasks can include hotel reservations, online shopping, health consultations, restaurant reservations, airline reservations, and video conferences for international business.
  • the two parties communicate in different languages.
  • the machine translation system shown in Figure 1 above can be used to translate the statements of both parties.
  • a statement is converted into a statement in another language, and the statement in the two-party dialogue is called a statement to be translated.
  • the language used by both parties of the conversation may be referred to as a first language and a second language, and the first language and the second language are different languages.
  • the first language can be Chinese, the second language is English, or the first language is English and the second language is Chinese.
  • Step 202 Determine a first named entity set in the statement to be translated, and an entity type of each first named entity in the first named entity set, where the first named entity set includes at least one first named entity.
  • the machine translation system can perform morphemes, syntax and semantics on the translated statements when the statement to be translated is obtained. And a series of analysis, and identification processing through domain adaptation, migration learning, semi-supervised learning, and rules, to obtain a first named entity set in the statement to be translated, and determine each first in the first named entity set The entity type of the named entity.
  • the entity type of the first named entity determined in the embodiment of the present invention is very different from the entity type of the named entity determined in the prior art.
  • the entity type of the determined named entity is a relatively wide type.
  • the existing naming recognition method can only recognize that the entity type is a digital type, and no further judgment is made.
  • the number is a phone number, a credit card number, or a room card number.
  • the entity type of the first named entity determined by the embodiment of the present invention is the entity type determined by further determining, that is, the entity type determined by the embodiment of the present invention is closely related to the statement to be translated and the specified dialog task, and the entity type may further Indicates the meaning of the first named entity in the statement to be translated.
  • the statement to be translated is: I am Amy Harris in Room 515
  • the first named entity in the statement to be translated may include: 515, Amy Harris, in the prior art, the first named entity identified is "515".
  • the entity type of the first name entity "Amy Harris” is a person name
  • the entity type of the first named entity "515" determined in the embodiment of the present invention is a room number
  • the first named entity "Ai” The entity type of Mikharis is the customer name.
  • Step 203 Determine a second named entity set of the second language representation according to the first named entity set and the entity type of each first named entity.
  • the second named entity set includes at least one second named entity, and the at least one second named entity corresponds to the at least one first named entity.
  • the machine translation system may determine the first according to the type of the first named entity Determining, by the second entity, the second named entity corresponding to the second language, so that for the at least one first named entity in the first named entity set, determining the second named entity set, the second named entity set includes at least one second named entity .
  • the at least one first naming entity is corresponding to the at least one second naming entity, and may be one-to-one correspondence, or may be one-to-many.
  • a first named entity corresponds to at least two second named entities
  • at least two second named entities mean the same, but two different representations, such as first and NO.
  • the machine translation system determines the second named entity set of the second language representation according to the first named entity set and the entity type of each first named entity, for each first named entity in the first named entity set
  • the machine translation system can directly translate the first named entity into a second named entity, thereby obtaining a second named entity set. For example, if the first named entity is September 11 and the entity type is date, then the second named entity translated according to September 11 is 11th of September.
  • the machine translation system obtains the first corresponding to the first named entity from the corresponding entity corresponding to the specified dialog task according to the entity type of the first named entity Second, the entity is named to obtain a second named entity set. For example, if the corresponding relationship of the named entity corresponding to the specified dialog task is as shown in Table 1 above, the first named entity is a single room, and the entity type is a room type, the machine translation system is named from the above table 1 according to the type of the room. The second named entity corresponding to the single room is obtained in the entity correspondence relationship, which is a Single Room.
  • the first named entity of the first part may be a commonly used and easily translated named entity, such as time, date, etc.
  • the first named entity of the second part may be a fixed phrase, such as a hotel name, a room type, and the like.
  • the machine translation system may query the named entity. Whether the first named entity exists in the correspondence. If it is determined to exist, the second named entity corresponding to the first named entity is obtained from the named entity correspondence according to the entity type of the first named entity. If it is determined that the non-existence exists, the third named entity represented by the second language corresponding to the first named entity is obtained according to the entity type of the first named entity.
  • the method for obtaining the third named entity may include: receiving a third named entity manually input, or obtaining a third named entity obtained by dictionary translation through a built-in dictionary interface.
  • the machine translation system may be based on the entity type, the first named entity, and the third named entity of the first named entity, in order to facilitate subsequent use in translation. Updates the corresponding relationship of the named entity corresponding to the specified dialog task. Specifically, the machine translation system stores, according to the entity type of the first named entity, the first named entity and the third named entity in the first language representation of the named entity and the named entity of the second language representation In the correspondence between the two.
  • the machine translation system is based on the first named entity.
  • the entity type, the first named entity, and the third named entity, and the corresponding relationship of the named entity corresponding to the specified dialog task are as shown in Table 5 below.
  • Step 204 Determine a source semantic template of the statement to be translated, and obtain a target semantic template corresponding to the source semantic template from the semantic template correspondence corresponding to the specified dialog task, where the semantic template correspondence is a semantic template of the first language representation and the second The correspondence between the semantic templates of the language representation.
  • the determining the source semantic template of the to-be-translated statement may be: deleting the first named entity in the statement to be translated and the first named entity in the statement to be translated, and deleting the first named entity in the statement to be translated. Or replace it with the corresponding entity type, that is, get the source semantic template of the statement to be translated.
  • the source semantic template pair may be obtained from the semantic template correspondence corresponding to the specified dialog task.
  • the target semantic template should be.
  • the semantic template corresponding to the specified dialog task is as shown in Table 2 above, the statement to be translated is: I am Amy Harris of Room 515, then the source semantic template determined can be: I am ⁇ room number> ⁇ Customer Name>, the target semantic template corresponding to the source semantic template obtained from Table 2 above is: I am ⁇ Customer Name>from ⁇ Room NO.>.
  • Step 205 Determine, according to the second named entity set and the target semantic template, the target translation statement, where the target translation statement is a translated statement corresponding to the statement to be translated represented by the second language.
  • the machine translation system may fill the corresponding location in the target semantic template according to the entity type of each second named entity in the second named entity. To get the target translation statement.
  • the entity type of the second named entity is consistent with the entity type of the corresponding first named entity.
  • the statement to be translated is: I want to book a double room on September 11th, and the first named entity set determined according to the above steps 201-204 includes: September 11 (date) and double room (room type).
  • the corresponding second named entity set includes: 11th of September (Date), Twin Room (Room Type), and the target semantic template is I'd like to book a ⁇ Room Type>on ⁇ Date>, according to the second name
  • the entity set and the target semantic template are determined by: I'd like to book a twin room on 11th of September.
  • the machine translation system can present the target translation statement to the dialogue participant in the specified conversation task, for example, after translating the Chinese-supplied statement to be translated into the target translation statement in English.
  • a target translation sentence expressed in English is presented to a dialogue participant who uses English for dialogue, thereby facilitating dialogue between designated dialogue participants.
  • determining a second named entity corresponding to each first named entity by determining a first named entity set in the statement to be translated represented by the first language, and an entity type of each first named entity, Obtaining a second named entity set, and determining a source semantic template of the statement to be translated and obtaining a target semantic template corresponding thereto, thereby determining a target translation sentence based on the second named entity set and the target semantic template, that is, the determined second language representation
  • the named entity is inserted into the corresponding position of the target language template, thereby improving the accuracy of the translation of the named entity and the semantic template, thereby ensuring the accuracy of the statement translation and the smooth progress of the specified dialogue task.
  • the method further includes: step 202a and/or step 203a.
  • the step 202a may be after the step 202 and may be in the order of the step 203 to the step 205.
  • the step 203a may be after the step 203 and the step 204-step 205 may be in no particular order.
  • Step 202a Display first semantic information, where the first semantic information includes a first named entity set and an entity type corresponding to each first named entity.
  • the machine translation system can present the first semantic information to the dialog in the specified dialog task.
  • the machine translation system may present first semantic information including the first named entity set and the entity type of each first named entity to the conversation participant using the first language, so that the conversation participant confirms its correctness. .
  • Step 203a Display second semantic information, where the second semantic information includes a second named entity set and an entity type corresponding to each second named entity.
  • the machine translation system can display the first The second semantic information manner presents the second named entity set and the entity type of each second named entity to the dialog participants in the specified dialog task. Specifically, the machine translation system may present second semantic information including the second named entity set and the entity type of each second named entity to the conversation participant using the second language, so that the conversation participant confirms its correctness. .
  • the first semantic information and the second semantic information may include information specifying a sentence to be translated in the conversation task, and may also include information of a plurality of sentences to be translated, that is, in the specified dialogue task, the machine translation system may For each sentence to be translated in the dialog, the first semantic information and/or the second semantic information are displayed, and the first semantic information and/or the second semantic information may also be displayed for the plurality of sentences to be translated in the dialog, and the present invention is implemented. This example does not limit this.
  • first semantic information may also include the content of the second semantic information
  • second semantic information may also include the content of the first semantic information. That is, the machine translation system can simultaneously present the content of the first semantic information and the second semantic information to one or more dialogue participants in the specified conversation task, so that not only can the semantics expressed in the first language and the second language be seen information.
  • the statement to be translated is: I want to reserve a double room on September 11, and determine the second name according to the above steps 201-203.
  • the machine translation system presents the second semantic information shown in Figure 5 to the dialog participants.
  • the first semantic information and/or the second semantic information may further include other information, which may be set according to the specific task content of the specified conversation task.
  • the second semantic information may be as shown in Table 6 below.
  • Table 6 includes the first named entity among the plurality of to-be-translated sentences, the second named entity corresponding thereto, and the entity type. It also includes a classification of such content, such as Hotel Information, Customer Information, Booking Information, and Others.
  • the dialogue participant in the specified conversation task can fully understand the other party according to the displayed first language information and/or the second language information. Intention and semantics to ensure the accuracy of the translation and the correctness of the specified dialogue task.
  • the method further includes: if the modification instruction is received, acquiring the modified statement of the statement to be translated, and performing the modified statement translation.
  • the dialog participant can trigger the modification to the machine translation system by the specified operation.
  • the instruction when the machine translation system receives the modification instruction, obtains the modified statement of the statement to be translated, and translates the modified statement according to the method described in the above steps 202-205.
  • the statement to be translated in the specified conversation task is colloquial
  • the statement is short, and the grammatical component may also be omitted, so when the conversation participant triggers the modification operation through the machine translation system, the conversation participant
  • the meaning of the expression is clearer and the grammatical component is more complete, so that the machine translation system re-acquires the modified statement of the statement to be translated, and translates the modified statement, thereby ensuring the accuracy of the translation, and Make sure that the specified conversation task goes smoothly.
  • the method further includes: step 200a and/or step 200b.
  • the steps 200a and 200b may be preceded by the step 202, and the steps 200a and 200b are in no particular order.
  • the method includes the steps 200a and 200b in FIG. 6 and is explained before the step 201.
  • Step 200a Determine, according to the training corpus corresponding to the specified dialog task, the corresponding relationship of the named entity corresponding to the specified dialog task.
  • the training corpus includes at least a training corpus represented by the first language and a training corpus of the second language representation corresponding to the training corpus represented by the first language.
  • the training corpus may be a dialog corpus that is commonly used to specify a conversation task, and may also include some expanded training corpus related to the specified conversation task.
  • the extended training corpus can be selected from a large number of conversational corpora through a data screening technique, such as a multi-lingual conversational corpus such as a movie or a TV series related to a specified dialogue task. Screening.
  • the machine translation system when the machine translation system determines the corresponding relationship of the named entity corresponding to the specified dialog task according to the training corpus corresponding to the specified dialog task, the machine translation system can perform training through a plurality of different training methods, for example, by using the sequence labeling model.
  • Supervised learning or combined with domain adaptation and rule matching, trains the training corpus of the specified dialogue task to obtain the corresponding named entity correspondence.
  • Step 200b Determine, according to the training corpus corresponding to the specified dialog task, the semantic template correspondence corresponding to the specified dialog task.
  • the machine translation system when the machine translation system determines the correspondence relationship between the semantic templates corresponding to the specified dialog task according to the training corpus corresponding to the specified dialog task, the machine translation system can supervise the learning and rules, and combine the information such as the conversation state and the entity type. Specify the training corpus corresponding to the dialogue task, and determine the corresponding semantic template correspondence.
  • the dialogue state can refer to the category representation of the semantic meaning of a sentence, for example, "What is your name?" and "What is your name?”, the intention of these two sentences is to ask the other party's name information, and then use the same category. (For example, asking for a name) means the meaning.
  • the training between the training corpus expressed in Chinese and the training corpus expressed in English is used as an example to specify a dialogue.
  • the task is a hotel reservation, and the process of the machine translation system obtaining the corresponding entity correspondence relationship through the above step 200a and obtaining the semantic template correspondence relationship through the above step 200b may be as shown in FIG. 7.
  • the language used by the customer in Figure 7 is English, and the language used by the hotel's customer service agent is Chinese as an example.
  • the machine translation system trains the training corpus expressed in English, and the named entities in English are: Kyoto Hotel, twin room and 11th of September, through the corresponding Chinese.
  • the training corpus indicated is trained, and the named entities represented by Chinese include: KyotoITA Hotel, September 11 and double rooms, and the identified Kyoto Hotel and Kyoto tourist Hotel are of Hotel Name, twin room and The entity type of the double room is Room Type, and the entity type of 11th of September and September 11 is Date. Therefore, the corresponding entity correspondence of the machine translation system training can be as shown in Table 7 below.
  • the machine translation system may not store the named entity pair and its entity type that are less difficult to translate directly in the named entity correspondence, for example, the named entity pair whose entity type is Date (September 11 and 11th of September), which can save the memory space of the machine translation system, and can directly translate according to the determined entity type during translation without affecting the accuracy of translation.
  • the machine translation system can also be stored in the corresponding entity correspondence, which is not specifically limited in the embodiment of the present invention.
  • the machine translation system trains the training corpus expressed in English, and the semantic templates for English representation include: “this is ⁇ Hotel Name>front desk” and “I'd like to book a ⁇ Room Type>on ⁇ Date> ", by training the corresponding Chinese-speaking training corpus, the semantic template of Chinese representation is obtained: “This is ⁇ hotel name> foreground” and "I want to book a ⁇ room type> of ⁇ date>", thus machine translation
  • Table 8 The semantic template correspondence obtained by the system training can be as shown in Table 8 below.
  • the machine translation system when the machine translation system translates the to-be-translated statement in the hotel reservation, the translation may be performed based on the corresponding entity correspondence relationship shown in Table 7 above and the semantic template correspondence shown in Table 8 above, and the translated target may be translated.
  • the statement is presented to the customer and/or agent.
  • the correspondence between the named entity of the first language representation included in the named entity correspondence and the named entity of the second language representation may be one-to-one, one-to-many, many-to-one, or The many-to-many relationship is not limited in this embodiment of the present invention.
  • a double room can also be called a twin room
  • the corresponding English can be a twin room or a double room.
  • the correspondence between the semantic template of the first language representation and the semantic template of the second language representation included in the semantic template correspondence relationship may also be one-to-one, one-to-many, many-to-one, or multiple pairs. The relationship between the multiple is not limited in the embodiment of the present invention.
  • the machine translation system may further normalize the entity type.
  • the specification process is to standardize the different descriptions of the same thing into a description mode.
  • the description mode after normalization can be a written description or a more general description. For example, when the entity type of the first named entity Amy Harris is determined as the customer name, the machine translation system can standardize the customer name and normalize it to the customer name.
  • the machine translation system can also normalize the source semantic template, that is, the source semantic template is standardized into a written description manner or a more general description manner.
  • the actual translation effect of the hotel reservation is evaluated on the standard data set by the method for determining the named entity provided by the embodiment of the present invention, and the results are shown in Table 9 below.
  • the embodiment of the present invention achieves high accuracy (P) and recall rate (R) in the identification and translation of the named entity, and 92.59% and 96.37% of the translated F value can ensure The credibility of the named entity entering the machine translation system.
  • the translation result provided by the embodiment of the present invention has a translation result as shown in FIG. 8 compared with the translation of the Google translation and the translation officer.
  • the accuracy of the method provided by the embodiment of the present invention is 30.7%, while the translation of the Google Translate and Translation Officer is only about 15%; for the translation from Chinese to English, the accuracy of the method provided by the embodiment of the present invention is 20.3%, while Google Translator and Translator's translation is only about 10%.
  • the specified dialog task by determining the first named entity set in the statement to be translated represented by the first language, the entity type of each first named entity, and the named entity trained according to the training corpus The relationship determines a second named entity corresponding to each first named entity, thereby improving the accuracy of the named entity translation, determining the source semantic template of the statement to be translated, and obtaining the semantic template correspondence according to the training corpus training Corresponding target semantic template, which can ensure the accuracy of semantic template translation, and then determine the target translation sentence based on the second named entity set and the target semantic template, the second language to be determined
  • the named entity is inserted into the corresponding position of the target language template, thereby ensuring the accuracy of the translation of the statement to be translated, so that the specified dialogue task can be performed correctly and smoothly.
  • a device e.g., a translation device, etc.
  • a device in order to implement the above-described functions, includes corresponding hardware structures and/or software modules for performing the various functions.
  • the present application can be implemented in a combination of hardware or hardware and computer software in conjunction with the apparatus and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
  • the embodiment of the present application may divide the function module into the translation device according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 9 is a schematic diagram of a possible structure of the translation apparatus involved in the foregoing embodiment, and the translation apparatus 300 includes: an obtaining unit 301, a first determining unit 302, The second determining unit 303 and the translating unit 304.
  • the obtaining unit 301 is configured to perform step 201 in FIG. 3, FIG. 4 or FIG. 6;
  • the first determining unit 302 is configured to perform step 202 and step 203 in FIG. 3, FIG. 4 or FIG. 6;
  • the second determining unit 303 For performing step 204 in FIG. 3, FIG. 4 or FIG. 6;
  • the translation unit 304 is configured to perform step 205 in FIG. 3, FIG. 4 or FIG.
  • the translation device 300 may further include: a training unit 305, and/or a display unit 306, and/or an update unit 307; wherein the training unit 305 is configured to perform step 200a and step 200b in FIG. 6; Steps 202a and 203a in FIGS. 4 and 6 are performed. All the related content of the steps involved in the foregoing method embodiments may be referred to the functional description of the corresponding functional modules, and details are not described herein again.
  • first determining unit 302, second determining unit 303, translation unit 304, training unit 305, and updating unit 307 may be processors, and the obtaining unit 301 may be an input device (such as a keyboard, a touch screen, etc.), and displayed.
  • Unit 306 can be a display.
  • FIG. 10 is a schematic diagram showing a possible logical structure of a translation apparatus 310 involved in the foregoing embodiment provided by an embodiment of the present invention.
  • the translation device 310 includes a processor 312, a communication interface 313, a memory 311, an input device 314, a display 315, and a bus 316.
  • the processor 312, the communication interface 313, the memory 311, the input device 314, and the display 315 are connected to one another via a bus 316.
  • the processor 312 is configured to perform control management on the actions of the translation device 310.
  • the processor 312 is configured to perform steps 202-205 in FIG. 3, FIG. 4 or FIG. 6, FIG. 4 or Steps 202a and 203a in FIG.
  • Communication interface 313 is used to support translation device 310 for communication.
  • the memory 311 is configured to store program codes and data of the translation device 310.
  • Input device 314 is used to support external input of translation device 310.
  • Display 315 is used to support translation device 310 for display.
  • the processor 312 can be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, and a hardware component. Or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, combinations of digital signal processors and microprocessors, and the like.
  • the bus 316 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the translation apparatus determines the first named entity set in the statement to be translated represented by the first language, the entity type of each first named entity, and the naming according to the training corpus.
  • the entity correspondence relationship determines a second named entity corresponding to each first named entity, thereby improving the accuracy of the named entity translation, determining the source semantic template of the sentence to be translated, and the semantic template correspondence according to the training corpus training Obtaining the target semantic template corresponding thereto, so as to ensure the accuracy of the semantic template translation, and then determining the target translation sentence based on the second named entity set and the target semantic template, and inserting the determined second language representation named entity into the target language template
  • the corresponding position ensures the accuracy of the translation of the statement to be translated, so that the specified dialogue task can be performed correctly and smoothly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

一种翻译方法及装置,涉及机器翻译领域,解决了面向对话领域翻译时翻译效果差、准确率较低的问题。该方法包括:获取待翻译语句,待翻译语句为指定对话任务中第一语言表示的语句;确定待翻译语句中的第一命名实体集,以及第一命名实体集中每个第一命名实体的实体类型;根据第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集;确定待翻译语句的源语义模板,并从指定对话任务对应的语义模板对应关系中获取与源语义模板对应的目标语义模板,语义模板对应关系为第一语言表示的语义模板与第二语言表示的语义模板之间的对应关系;根据第二命名实体集和目标语义模板,确定目标翻译语句。

Description

一种翻译方法及装置
本申请要求于2017年02月22日提交中国专利局、申请号为201710097655.9、申请名称为“一种翻译方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及机器翻译领域,尤其涉及一种翻译方法及装置。
背景技术
机器翻译是指利用计算机将一种语言的语句转换为另一种语言的语句的过程,且根据翻译方法大致可以将机器翻译分为基于统计的机器翻译和基于分析的机器翻译。其中,基于分析的机器翻译是指对第一语言的语句进行语素、句法和语义分析,并将分析的结构转换为第二语言,之后生成对应的第二语言的语句。
目前,基于分析的机器翻译的研究和应用仅限于通用领域和一些少数限定领域,比如,新闻和翻译软件等单向的翻译,而面向对话的双向翻译则相对较少。不同于其他的应用领域,面向对话领域的机器翻译具有其自身的特点。其中,对话情景中的口语化会导致语句偏短、多种句法成分被省略,从而使得信息表达具有多样性和歧义性。比如,在酒店预订的对话场景中,“single”和“twin”是指特定的房间类型,即单人房和双人房,而使用翻译软件等进行翻译时,“single”会被翻译为一个、单打或者单身的,“twin”会被翻译为双胞胎,与单人房和双人房的意思完全不同。因此,常用的机器翻译方法在对话领域的翻译效果差,准确率较低。
发明内容
本发明的实施例提供一种翻译方法及装置,解决了现有技术中面向对话领域翻译时翻译效果差、准确率较低的问题。
为达到上述目的,本发明的实施例采用如下技术方案:
第一方面,提供一种翻译方法,该方法包括:获取待翻译语句;其中,待翻译语句为指定对话任务中第一语言表示的语句;确定待翻译语句中的第一命名实体集,以及第一命名实体集中每个第一命名实体的实体类型;其中,第一命名实体集包括至少一个第一命名实体;根据第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集;其中,第二命名实体集包括至少一个第二命名实体,且至少一个第二命名实体与至少一个第一命名实体对应;确定待翻译语句的源语义模板,并从指定对话任务对应的语义模板对应关系中获取与源语义模板对应的目标语义模板;其中,语义模板对应关系为第一语言表示的语义模板与第二语言表示的语义模板之间的对应关系;根据第二命名实体集和目标语义模板,确定目标翻译语句;其中,目标翻译语句为第二语言表示的与待翻译语句对应的翻译后的语句。
上述技术方案中,在面向对话领域的翻译时,针对待翻译语句所在的指定对话任务,确定待翻译语句中的第一命名实体集和每个第一命名实体的实体类型,以及确定 与其对应的第二命名实体集,并在确定待翻译语句的源语义模板之后,根据指定对话任务对应的语义模板对应关系获取与其对应的目标语义模块,之后根据第二命名实体集和目标语句模板确定目标翻译语句,从而可以结合指定对话任务的特点和对话内容的语义理解,有针对性的对待翻译语句进行翻译,进而可以提高翻译效果以及保证较高的翻译准确率。
在第一方面的一种可能的实现方式中,根据第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集,包括:对于第一命名实体集中的每个第一命名实体,根据第一命名实体的实体类型,从指定对话任务对应的命名实体对应关系中获取与第一命名实体对应的第二命名实体,从而得到第二命名实体集;其中,命名实体对应关系为第一语言表示的命名实体与第二语言表示的命名实体之间的对应关系。上述可能的实现方式中,通过从指定对话任务对应的命名实体对应关系中获取第一命名实体对应的第二命名实体,可以提高第一命名实体翻译的准确性,避免第一命名实体因为语意多样性而导致翻译错误或歧义性大。
在第一方面的一种可能的实现方式中,该方法还包括:根据指定对话任务对应的训练语料,确定指定对话任务对应的命名实体对应关系;其中,训练语料至少包括第一语言表示的训练语料、以及与第一语言表示的训练语料对应的第二语言表示的训练语料。上述可能的实现方式中,通过对指定对话任务对应的训练语料进行训练,得到指定对话任务对应的命名实体对应关系,可以保证该命名实体对应关系中命名实体之间翻译的有效性和准确性,进而在根据该命名实体对应关系确定第二命名实体时可以提高第二命名实体翻译的准确性。
在第一方面的一种可能的实现方式中,该方法还包括:根据指定对话任务对应的训练语料,确定指定对话任务对应的语义模板对应关系;其中,训练语料至少包括第一语言表示的训练语料、以及与第一语言表示的训练语料对应的第二语言表示的训练语料。上述可能的实现方式中,通过对指定对话任务对应的训练语料进行训练,得到指定对话任务对应的语义模板对应关系,可以保证该语义模板对应关系语义模板之间翻译的有效性和准确性,进而在根据该语义模板对应关系翻译源语义模板时可以提高源语义模板翻译的准确性。
在第一方面的一种可能的实现方式中,该方法还包括:显示第一语义信息,第一语义信息包括第一命名实体集和每个第一命名实体对应的实体类型;和/或,显示第二语义信息,第二语义信息包括第二命名实体集和每个第二命名实体对应的实体类型。上述可能的实现方式中,通过显示第一语义信息和/或显示第二语义信息可以使指定对话任务中的对话参与者根据显示的第一语言信息和/或第二语言信息,充分理解对方的意图和语意,从而确保翻译的准确性及指定对话任务的正确性。
在第一方面的一种可能的实现方式中,显示第一语义信息和/或显示第二语义信息之后,该方法还包括:若接收到修改指令,则获取待翻译语句修改后的语句,并对修改后的语句进行翻译。上述可能的实现方式中,当对话参与者触发修改操作时,对话参与者可以通过修改待翻译语句使其表达的语意更清楚、语法成分更完整,从而机器翻译系统重新获取待翻译语句修改后的语句,并对修改后语句进行翻译,进而可以进一步保证翻译的准确性,以及保证指定对话任务的顺利进行。
在第一方面的一种可能的实现方式中,该方法还包括:若确定指定对话任务对应的命名实体对应关系中不存在第一命名实体集中的第一命名实体,则根据第一命名实体的实体类型,获取与第一命名实体对应的第二语言表示的第三命名实体;根据第一命名实体的实体类型、第一命名实体和第三命名实体,更新指定对话任务对应的命名实体对应关系。上述可能的实现方式中,在指定对话任务对应的命名实体对应关系中不存在第一命名实体集中的第一命名实体时,可以通过接收人工输入的第三命名实体、或者通过内置的词典接口进行翻译等方法获取与第一命名实体对应的第二语言表示的第三命名实体,并对指定对话任务对应的命名实体对应关系进行更新,从而在后续进行翻译时可以直接使用,提高后续翻译的效率。
第二方面,提供一种翻译装置,该装置包括:获取单元,用于获取待翻译语句;其中,待翻译语句为指定对话任务中第一语言表示的语句;第一确定单元,用于确定待翻译语句中的第一命名实体集,以及第一命名实体集中每个第一命名实体的实体类型;其中,第一命名实体集包括至少一个第一命名实体;第一确定单元,还用于根据第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集;其中,第二命名实体集包括至少一个第二命名实体,且至少一个第二命名实体与至少一个第一命名实体对应;第二确定单元,用于确定待翻译语句的源语义模板,并从指定对话任务对应的语义模板对应关系中获取与源语义模板对应的目标语义模板;其中,语义模板对应关系为第一语言表示的语义模板与第二语言表示的语义模板之间的对应关系;翻译单元,用于根据第二命名实体集和目标语义模板,确定目标翻译语句;其中,目标翻译语句为第二语言表示的与待翻译语句对应的翻译后的语句。
在第二方面的一种可能的实现方式中,第一确定单元具体用于:对于第一命名实体集中的每个第一命名实体,根据第一命名实体的实体类型,从指定对话任务对应的命名实体对应关系中获取与第一命名实体对应的第二命名实体,从而得到第二命名实体集;其中,命名实体对应关系为第一语言表示的命名实体与第二语言表示的命名实体之间的对应关系。
在第二方面的一种可能的实现方式中,该装置还包括:训练单元,用于根据指定对话任务对应的训练语料,确定指定对话任务对应的命名实体对应关系;其中,训练语料至少包括第一语言表示的训练语料、以及与第一语言表示的训练语料对应的第二语言表示的训练语料。
在第二方面的一种可能的实现方式中,该装置还包括:训练单元,用于根据指定对话任务对应的训练语料,确定指定对话任务对应的语义模板对应关系;其中,训练语料至少包括第一语言表示的训练语料、以及与第一语言表示的训练语料对应的第二语言表示的训练语料。
在第二方面的一种可能的实现方式中,该装置还包括:显示单元,用于显示第一语义信息,第一语义信息包括第一命名实体集和每个第一命名实体对应的实体类型;和/或,显示单元,还用于显示第二语义信息,第二语义信息包括第二命名实体集和每个第二命名实体对应的实体类型。
在第二方面的一种可能的实现方式中,获取单元还用于,若翻译装置接收到修改指令,获取待翻译语句修改后的语句,并对修改后的语句进行翻译。
在第二方面的一种可能的实现方式中,获取单元,还用于若确定指定对话任务对应的命名实体对应关系中不存在第一命名实体集中的第一命名实体,则根据第一命名实体的实体类型,获取与第一命名实体对应的第二语言表示的第三命名实体;该装置还包括:更新单元,用于根据第一命名实体的实体类型、第一命名实体和第三命名实体,更新指定对话任务对应的命名实体对应关系。
第三方面,提供一种翻译装置,该翻译装置包括存储器、处理器、总线和通信接口,存储器中存储代码和数据,处理器与存储器通过总线连接,处理器运行存储器中的代码使得该翻译装置执行上述第一方面或第一方面任一种可能的实现方式所提供的翻译方法。
本申请的又一方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本申请的又一方面提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
可以理解地,上述提供的任一种翻译方法的装置、计算机存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为本发明实施例提供的一种机器翻译系统的结构示意图;
图2为本发明实施例提供的另一种机器翻译系统的结构示意图;
图3为本发明实施例提供的一种翻译方法的流程图;
图4为本发明实施例提供的另一种翻译方法的流程图;
图5为本发明实施例提供的一种翻译实例的示意图;
图6为本发明实施例提供的又一种翻译方法的流程图;
图7为本发明实施例提供的一种训练语料的训练示意图;
图8为本发明实例提供的一种翻译结果的比较示意图;
图9为本发明实施例提供的一种翻译装置的结构示意图;
图10为本发明实施例提供的另一种翻译装置的结构示意图。
具体实施方式
在介绍本申请之前,首先对本申请涉及的技术名词进行解释说明。
指定对话任务,是指面向对话型翻译的某一任务,该指定对话任务可以是预设任务中的任意一个任务。该预设任务为对话方使用不同的语言进行对话,且对话的内容具有一定的范围。比如,预设任务可以包括面向国际业务的酒店预订、在线购物、健康咨询、餐厅预订、机票预订以及电视电话会议等等。
训练语料,是指用于进行训练的不同语言对应的语句对,语句对是指同一语意通过不同种语言表示的语句,比如,汉语和英语对应的语句对(比如,我是515房间的艾米哈里斯,I am Amy Hams from Room 515)。其中,训练语料可以是两种语言对应的训练语料,也可以是三种或者三种以上的语言对应的训练语料。指定对话任务下的训练语料,是指训练语料的内容或语意是关于指定对话任务的不同语言对应的语句对。
命名实体,是指语句中的关键信息,该关键信息可以是语句中具体的数字、人名、 地名或者组织名等等。比如,某一语句为:我是515房间的艾米哈里斯,则该语句中的命名实体可以包括:515、艾米哈里斯。
命名实体的实体类型,用于表示命名实体的属性。在本发明的实施例中,命名实体的实体类型与命名实体的语句、以及该语句所在的指定对话任务紧密相关,通过命名实体的实体类型可以进一步表明命名实体在其语句中的含义。比如,命名实体“515”的实体类型可以为房间号,命名实体“艾米哈里斯”的实体类型可以为客户姓名。
命名实体对应关系,是指同一实体类型下,不同种语言表示的同一命名实体之间的对应关系。比如,酒店预订任务下的命名实体对应关系可以如下表1所示,表1中以命名实体对应关系包括汉语表示的命名实体与英语表示的命名实体之间的对应关系为例进行说明。
表1
Figure PCTCN2017112384-appb-000001
语义模板,是指用于表示某一特定语意的语句模板。其中,语义模板中具体的命名实体的内容可以缺失,或者通过命名实体的实体类型进行替换表示。比如,I am Amy Hams from Room 515对应的语义模板可以为:I am<Customer Name>from<Room NO.>。
语义模板对应关系,是指不同语言表示的同一语意的语义模板之间的对应关系。比如,酒店预订任务下的语义模板对应关系可以如下表2所示,表2中以语义模板对应关系包括汉语表示的语义模板与英语表示的语义模板之间的对应关系为例进行说明。
表2
Figure PCTCN2017112384-appb-000002
语义信息,用于通过预设实体类型和语句中包括的命名实体表示该语句对应的语意的信息。其中,语义信息可以包括至少一个预设实体类型、以及某一语句中包括的至少一个命名实体所对应的命名实体的实体类型之间关系的信息。第一语义信息是指第一语言表示的信息,第二语义信息是指第二语言表示的信息。
比如,酒店预订任务下,以第一语言为汉语,第二语言为英语,则第一语义信息 可以如下表3所示,第二语义信息可以如下表4所示。表3和表4以酒店预订任务下的对话语句包括:“hello,this is Hilton Hotel”,“我想预定一间9月11日的双人房”为例进行说明。其中,第一个语句对应的命名实体包括:Hilton Hotel,第二个语句对应的命名实体包括:9月11日和双人房。Hilton Hotel对应的实体类型为Hotel Name,9月11日对应的实体类型为日期,双人房对应的实体类型为房间类型。
表3第一语义信息
项目(预设实体类型) 内容(命名实体)
酒店名称 希尔顿酒店
客户姓名  
地址  
电话号码  
入住日期 9月11日
房间类型 双人房
…… ……
表4第二语义信息
Item Content
Hotel Name Hilton Hotel
Customer Name  
Address  
Tele.NO.  
Date 11th of September
Room Type Twin Room
…… ……
图1为本发明实施例提供的一种机器翻译系统的结构示意图,参见图1,机器翻译系统包括第一确定单元101、第二确定单元102和翻译单元103。
其中,第一确定单元101用于对语句中的命名实体进行识别和翻译。具体的,在命名实体训练过程中,可用于对任一指定对话任务下的训练语料进行命名实体的识别、实体类型的定义,以及将实体类型相同且对应的命名实体存储在命名实体对应关系中。比如,第一确定单元101可通过监督式学习、领域自适应以及规则方法识别命名实体,并定义命名实体的实体类型。在翻译语句时,第一确定单元101可用于识别语句中的命名实体、以及命名实体的实体类型,并根据命名实体对应关系将每个命名实体翻译为目标语言对应的命名实体。此外,为了避免根据训练语料识别的命名实体不能涵盖所有的命名实体,第一确定单元101还可以设置有词典接口,利用基于词典的命名实体识别来添加用户自定义的命名实体。
第二确定单元102用于对语句进行语义分析,得到语义模板。其中,语义模板表示的形式是多种多样的,可以是逻辑表达式,也可以是句子句型。比如,第二确定单元102可通过监督式学习或规则,根据上下文内容,利用第一确定单元101识别的命名实体、以及实体类型构建语义模板。
翻译单元103用于将待翻译语句翻译为目标翻译语句。待翻译语句可以是第一语言表示的语句,比如,待翻译语句为汉语表示的句子。目标翻译语句可以是与待翻译语句的语意相同的第二语言表示的语句,比如,目标翻译语句为英语表示的句子。在翻译之后,机器翻译系统还可以将目标翻译语句呈现给用户,比如,可以将目标翻译语句显示给用户,或者通过语音、视频等播放给用户。
如图2所示,为本发明实施例提供的另一种机器翻译系统的结构示意图,该机器翻译系统两种语言之间的翻译,为便于描述,将第一种语言对应的翻译称为源端,第二种语言对应的翻译称为目标端,则该机器翻译系统可以包括源端第一确定单元111、源端第二确定单元112、目标端第一确定单元121、目标端第二确定单元122和翻译单元130。其中,源端第一确定单元111、源端第二确定单元112用于对第一种语言表示的语句和语料进行识别和分析,以得到第一种语言表示的命名实体和语义模板集。目标端第一确定单元121、目标端第二确定单元122用于对第二种语言表示的语句和语料进行识别和分析,以得到第二种语言表示的命名实体和语义模板。翻译单元130用于进行两种语言之间的语句翻译。
本发明实施例的基本原理在于,通过对基于对话任务的训练语料进行命名实体和实体类型的识别训练,得到命名实体对应关系,以及进行语义模板的识别训练,得到语义模板对应关系。在对待翻译语句进行翻译时,对待翻译语句进行命名实体识别和翻译、以及识别待翻译语句中的源语义模板并获取其对应的目标语义模板,之后将翻译后的命名实体根据对应的实体类型填入与目标语义模板中,从而完成待翻译语句的翻译。其中,通过命名实体对应关系和语义模板对应关系,对待翻译语句对应的命名实体和源语义模板进行翻译,可以增强命名实体和源语义模板翻译的准确率,同时通过显示语义信息,可以使对话的双方充分理解对方的语意,确保翻译后的目标翻译语句的准确性,提高任务完成的成功率。
图3为本发明实施例提供的一种翻译方法的流程图,参见图3,该方法可以包括以下几个步骤。
步骤201:获取待翻译语句,待翻译语句为指定对话任务中第一语言表示的语句。
指定对话任务是指面向对话型翻译中的某一任务,指定对话任务可以是预设任务中的任意一个任务。该预设任务为对话方使用不同的语言进行对话,且对话的内容具有一定的范围。比如,预设任务可以包括面向国际业务的酒店预订、在线购物、健康咨询、餐厅预订、机票预订以及电视电话会议等。
在指定对话任务中,对话双方使用不同的语言进行交流,为了使双方可以无障碍的进行对话交流,可以使用上述图1所示的机器翻译系统对对话双方的语句进行翻译,即将一种语言的语句转换为另一种语言的语句,双方对话中的语句即称为待翻译语句。其中,将对话双方使用的语言可以称为第一语言和第二语言,第一语言和第二语言为不同的语言。比如,第一语言可以为汉语、第二语言为英语,或者第一语言为英语,第二语言为汉语。
步骤202:确定待翻译语句中的第一命名实体集,以及第一命名实体集中每个第一命名实体的实体类型,第一命名实体集包括至少一个第一命名实体。
当获取到待翻译语句时,机器翻译系统可以对待翻译语句进行语素、句法和语义 等一系列分析,以及通过领域自适应、迁移学习、半监督式学习以及规则等技术进行识别处理,从而得到待翻译语句中的第一命名实体集,并确定第一命名实体集中每个第一命名实体的实体类型。
其中,本发明实施例中确定的第一命名实体的实体类型与现有技术中确定的命名实体的实体类型存在很大不同。现有技术中,确定的命名实体的实体类型是比较广泛的类型,比如,对于数字类型的命名实体,现有的命名识别方法只能识别出其实体类型为数字类型,而不会进一步判断该数字为电话号码、信用卡号或者房卡号等。而本发明实施例确定的第一命名实体的实体类型是进一步判断后确定的实体类型,即本发明实施例确定的实体类型与待翻译语句、以及指定对话任务紧密相关,通过该实体类型可以进一步表明第一命名实体在待翻译语句中的含义。
比如,待翻译语句为:我是515房间的艾米哈里斯,待翻译语句中的第一命名实体可以包括:515、艾米哈里斯,则现有技术中,确定的第一命名实体“515”的实体类型为数字、第一命名实体“艾米哈里斯”的实体类型为人名,本发明实施例中确定的第一命名实体“515”的实体类型为房间号、第一命名实体“艾米哈里斯”的实体类型为客户姓名。
步骤203:根据第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集。其中,第二命名实体集包括至少一个第二命名实体,且至少一个第二命名实体与至少一个第一命名实体对应。
当确定第一命名实体集和每个第一命名实体的实体类型之后,对于第一命名实体集中的每个第一命名实体,机器翻译系统可以根据该第一命名实体的类型,确定该第一命名实体对应的第二语言表示的第二命名实体,从而对于第一命名实体集中的至少一个第一命名实体,可以确定得到第二命名实体集,第二命名实体集包括至少一个第二命名实体。
需要说明的是,至少一个第一命名实体与至少一个第二命名实体对应,可以是一一对应,也可以是一对多,本发明实施例对此不做限定。当一个第一命名实体与至少两个第二命名实体对应时,至少两个第二命名实体的意思是一样的,只是两个不同的表述,比如,first和NO.1。
具体的,当机器翻译系统根据第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集时,对于第一命名实体集中的每个第一命名实体,机器翻译系统可以将该第一命名实体直接翻译为第二命名实体,从而得到第二命名实体集。比如,第一命名实体为9月11日,实体类型为日期,则根据9月11日翻译的第二命名实体为11th of September。
或者,对于第一命名实体集中的每个第一命名实体,机器翻译系统根据该第一命名实体的实体类型,从指定对话任务对应的命名实体对应关系中获取与该第一命名实体对应的第二命名实体,从而得到第二命名实体集。比如,若指定对话任务对应的命名实体对应关系如上述表1所示,第一命名实体为单人房,实体类型为房间类型,则机器翻译系统根据房间类型,从上述表1所示的命名实体对应关系中获取与单人房对应的第二命名实体为Single Room。
或者,对于第一命名实体集中的第一部分的第一命名实体进行直接翻译,对于第 二部分的第一命名实体从指定对话任务对应的命名实体对应关系中获取,从而得到第二命名实体集。其中,第一部分的第一命名实体可以是常用且容易翻译的命名实体,比如时间、日期等等,第二部分的第一命名实体可以是固定词组,比如酒店名称、房间类型等等。
进一步的,当机器翻译系统根据第一命名实体的实体类型,从指定对话任务对应的命名实体对应关系中获取与该第一命名实体对应的第二命名实体时,机器翻译系统可以查询该命名实体对应关系中是否存在该第一命名实体。若确定存在,则根据该第一命名实体的实体类型,从该命名实体对应关系中获取与该第一命名实体对应的第二命名实体。若确定不存在,则根据第一命名实体的实体类型,获取与第一命名实体对应的第二语言表示的第三命名实体。其中,获取第三命名实体的方法可以包括:接收人工输入的第三命名实体、或者通过内置的词典接口获取词典翻译得到的第三命名实体等。
当机器翻译系统获取第一命名实体对应的第三命名实体之后,为了便于后续在进行翻译时可以直接使用,机器翻译系统可以根据第一命名实体的实体类型、第一命名实体和第三命名实体,更新指定对话任务对应的命名实体对应关系。具体为:机器翻译系统根据第一命名实体的实体类型,将第一命名实体和第三命名实体存储在该命名实体对应关系中的第一语言表示的命名实体与第二语言表示的命名实体之间的对应关系中。
比如,当命名实体对应关系如上述表1所示时,若第一命名实体为京都观光饭店且实体类型为酒店名称,对应的第三命名实体为Kyoto Hotel,则机器翻译系统根据第一命名实体的实体类型、第一命名实体和第三命名实体,更新指定对话任务对应的命名实体对应关系如下表5所示。
表5
Figure PCTCN2017112384-appb-000003
步骤204:确定待翻译语句的源语义模板,并从指定对话任务对应的语义模板对应关系中获取与源语义模板对应的目标语义模板,语义模板对应关系为第一语言表示的语义模板与第二语言表示的语义模板之间的对应关系。
其中,确定待翻译语句的源语义模板,具体可以是在确定待翻译语句中的第一命名实体集以及每个第一命名实体的实体类型之后,将待翻译语句中的第一命名实体删除,或者将其替换为对应的实体类型,即得到待翻译语句的源语义模板。当确定源语义模板之后,可以从指定对话任务对应的语义模板对应关系中,获取与源语义模板对 应的目标语义模板。
比如,若指定对话任务对应的语义模板对应关系如上述表2所示,待翻译语句为:我是515房间的艾米哈里斯,则确定的源语义模板可以为:我是<房间号>的<客户姓名>,从上述表2获取的与源语义模板对应的目标语义模板为:I am<Customer Name>from<Room NO.>。
步骤205:根据第二命名实体集和目标语义模板,确定目标翻译语句,目标翻译语句为第二语言表示的与待翻译语句对应的翻译后的语句。
当机器翻译系统确定第二命名实体集和目标语义模板之后,机器翻译系统可以根据第二命名实体中每个第二命名实体的实体类型,将其对应的填充在目标语义模板中对应的位置上,从而得到目标翻译语句。其中,第二命名实体的实体类型与其对应的第一命名实体的实体类型一致。
比如,待翻译语句为:我想预定一间9月11日的双人房,根据上述步骤201-步骤204确定的第一命名实体集包括:9月11日(日期)和双人房(房间类型),其对应的第二命名实体集包括:11th of September(Date)、Twin Room(Room Type),目标语义模板为I'd like to book a<Room Type>on<Date>,则根据第二命名实体集和目标语义模板,确定的目标翻译语句为:I'd like to book a twin room on 11th of September。
最后,当机器翻译系统确定目标翻译语句之后,机器翻译系统可以将该目标翻译语句呈现给指定对话任务中的对话参与者,比如将汉语表示的待翻译语句翻译为英语表示的目标翻译语句后,将英语表示的目标翻译语句呈现给使用英语进行对话的对话参与者,从而促进指定对话参与者之间的对话交流。
在本发明实施例中,通过确定第一语言表示的待翻译语句中的第一命名实体集、每个第一命名实体的实体类型,确定与每个第一命名实体对应的第二命名实体,得到第二命名实体集,以及确定待翻译语句的源语义模板并获取与其对应的目标语义模板,从而基于第二命名实体集和目标语义模板,确定目标翻译语句,即将确定的第二语言表示的命名实体插入到目标语言模板的相应位置,从而提高了命名实体和语义模板翻译的准确性,进而保证了语句翻译的准确性以及指定对话任务的顺利进行。
进一步的,参见图4,该方法还包括:步骤202a和/或步骤203a。其中,步骤202a可以位于步骤202之后且与步骤203-步骤205可以不分先后顺序,步骤203a可以位于步骤203之后且与步骤204-步骤205可以不分先后顺序。
步骤202a:显示第一语义信息,第一语义信息包括第一命名实体集和每个第一命名实体对应的实体类型。
当机器翻译系统确定待翻译语句中的第一命名实体集和每个第一命名实体的实体类型之后,机器翻译系统可以通过显示第一语义信息的方式将其呈现给指定对话任务中的对话参与者。具体的,机器翻译系统可以将包括第一命名实体集和每个第一命名实体的实体类型的第一语义信息呈现给使用第一语言的对话参与者,以使该对话参与者确认其正确性。
步骤203a:显示第二语义信息,第二语义信息包括第二命名实体集和每个第二命名实体对应的实体类型。
同理,当机器翻译系统确定第二命名实体集之后,机器翻译系统可以通过显示第 二语义信息的方式将第二命名实体集和每个第二命名实体的实体类型呈现给指定对话任务中的对话参与者。具体的,机器翻译系统可以将包括第二命名实体集和每个第二命名实体的实体类型的第二语义信息呈现给使用第二语言的对话参与者,以使该对话参与者确认其正确性。
在实际应用中,第一语义信息和第二语义信息可以包括指定对话任务中的一句待翻译语句的信息,也可以包括多句待翻译语句的信息,即在指定对话任务中,机器翻译系统可以对于对话中的每一句待翻译语句,显示第一语义信息和/或第二语义信息,也可以对于对话中的多句待翻译语句显示第一语义信息和/或第二语义信息,本发明实施例对此不做限定。
另外,第一语义信息也可以包括第二语义信息的内容,第二语义信息也可以包括第一语义信息的内容。即机器翻译系统可以将第一语义信息和第二语义信息的内容同时呈现给指定对话任务中的一个或者多个对话参与者,使其不仅可以看到使用第一语言和第二语言表示的语义信息。
比如,如图5所示,若第一语言为汉语,第二语言为英语,待翻译语句为:我想预定一间9月11日的双人房,按照上述步骤201-步骤203确定第二命名实体集之后,机器翻译系统将图5中所示的第二语义信息呈现给对话参与者。
再者,第一语义信息和/或第二语义信息还可以包括其他信息,该其他信息可以根据指定对话任务的具体任务内容进行设置。比如,对于酒店预订中的一个完整的对话,第二语义信息可以如下表6所示,表6中包括多个待翻译语句中的第一命名实体、与其对应的第二命名实体、以及实体类型,同时还包括对这些内容的分类,比如酒店信息(Hotel Information)、客户信息(Customer Information)、预订信息(Booking Information)和其他(Others)等。
表6
Figure PCTCN2017112384-appb-000004
在本发明实施例中,通过显示第一语言信息和/或显示第二语言信息,可以使指定对话任务中的对话参与者根据显示的第一语言信息和/或第二语言信息,充分理解对方的意图和语意,从而确保翻译的准确性及指定对话任务的正确性。
进一步的,当机器翻译系统显示第一语义信息和/或显示第二语义信息之后,该方法还包括:若接收到修改指令,则获取待翻译语句修改后的语句,并对修改后的语句进行翻译。
其中,当机器翻译系统显示第一语义信息和/或显示第二语义信息之后,若指定对话任务的对话参与者确定显示的内容不正确,则对话参与者可以通过指定操作向机器翻译系统触发修改指令,当机器翻译系统接收修改指令时,可以获取待翻译语句修改后的语句,并对修改后的语句按照上述步骤202-步骤205所述的方法进行翻译。
在本发明实施例中,由于指定对话任务中的待翻译语句偏口语化,其语句较短,且语法成分也可能被省略,因此当对话参与者通过机器翻译系统触发修改操作时,对话参与者可以通过修改待翻译语句使其表达的语意更清楚、语法成分更完整,从而机器翻译系统重新获取待翻译语句修改后的语句,并对修改后语句进行翻译,进而可以保证翻译的准确性,以及确保指定对话任务的顺利进行。
进一步的,参见图6,该方法还包括:步骤200a和/或步骤200b。其中,步骤200a和步骤200b可以位于步骤202之前,步骤200a和步骤200b不分先后顺序,图6中以该方法包括步骤200a和步骤200b,且位于步骤201之前为例进行说明。
步骤200a:根据指定对话任务对应的训练语料,确定指定对话任务对应的命名实体对应关系。
其中,训练语料至少包括第一语言表示的训练语料、以及与第一语言表示的训练语料对应的第二语言表示的训练语料。
该训练语料可以是指定对话任务常用的对话语料,也可以包括一些扩充的与指定对话任务有关的训练语料。扩充的训练语料可以是通过数据筛选技术从大量的对话型语料中选择出与指定对话任务有关的对话,比如,可以通过数据筛选技术从与指定对话任务有关的电影、电视剧等多语对话型语料中筛选。
具体的,当机器翻译系统根据指定对话任务对应的训练语料,确定指定对话任务对应的命名实体对应关系时,机器翻译系统可以通过多种不同的训练方式进行训练,比如,可以通过序列标注模型进行监督式学习,或者结合领域自适应以及规则匹配等方法,对指定对话任务的训练语料进行训练,得到对应的命名实体对应关系。
步骤200b:根据指定对话任务对应的训练语料,确定指定对话任务对应的语义模板对应关系。
具体的,当机器翻译系统根据指定对话任务对应的训练语料,确定指定对话任务对应的语义模板对应关系时,机器翻译系统可以监督式学习和规则等方法,结合对话状态、实体类型等信息,从指定对话任务对应的训练语料,确定对应的语义模板对应关系。其中,对话状态可以是指一句话的语意的类别表示,比如,“请问您贵姓”和“您叫什么名字?”,这两句的意图都是询问对方的姓名信息,进而可以用同一的类别(比如,询问姓名)表示该语意。
比如,以汉语表示的训练语料和英语表示的训练语料之间的训练为例,指定对话 任务为酒店预订,则机器翻译系统通过上述步骤200a获取命名实体对应关系、以及通过上述步骤200b获取语义模板对应关系的过程可以如图7所示。图7中的客户(customer)使用的语言为英语,酒店的客服(agent)使用的语言为汉语为例进行说明。
其中,以图7中所示的训练语料为例,机器翻译系统通过对英语表示的训练语料进行训练,得到英语表示的命名实体包括:Kyoto Hotel、twin room和11th of September,通过对对应的汉语表示的训练语料进行训练,得到汉语表示的命名实体包括:京都观光饭店、9月11日和双人房,且确定的Kyoto Hotel和京都观光饭店的实体类型为酒店名称(Hotel Name)、twin room和双人房的实体类型为房间类型(Room Type)、11th of September和9月11日的实体类型为日期(Date),从而机器翻译系统训练得到的命名实体对应关系可以如下表7所示。
表7
Figure PCTCN2017112384-appb-000005
需要说明的是,机器翻译系统可以在命名实体对应关系中不存储直接翻译难度较小的命名实体对和其实体类型,比如,实体类型为日期(Date)的命名实体对(9月11日和11th of September),从而可以节省机器翻译系统的内存空间,且在翻译时可以根据确定实体类型直接进行翻译也不会影响到翻译的准确性。当然,机器翻译系统也可以将其存储在命名实体对应关系中,本发明实施例对此不作具体限定。
另外,机器翻译系统通过对英语表示的训练语料进行训练,得到英语表示的语义模板包括:“this is<Hotel Name>front desk”和“I'd like to book a<Room Type>on<Date>”,通过对对应的汉语表示的训练语料进行训练,得到汉语表示的语义模板:“这里是<酒店名称>前台”和“我想预定一间<日期>的<房间类型>”,从而机器翻译系统训练得到的语义模板对应关系可以如下表8所示。
表8
Figure PCTCN2017112384-appb-000006
进而,当机器翻译系统对酒店预订中的待翻译语句进行翻译时,可以基于上述表7所示的命名实体对应关系和上述表8所示的语义模板对应关系进行翻译,并将翻译后的目标语句呈现给客户(customer)和/或客服(agent)。
在本发明的实施例中,命名实体对应关系中包括的第一语言表示的命名实体与第二语言表示的命名实体之间的对应关系可以是一对一、一对多、多对一、或者多对多的关系,本发明实施例对此不作限定。比如,双人房也可以称为双床房,对应的英文可以为twin room,也可以为double room。同理,语义模板对应关系中包括的第一语言表示的语义模板与第二语言表示的语义模板之间的对应关系也可以也可以是一对一、一对多、多对一、或者多对多的关系,本发明实施例对此不作限定。
进一步的,当机器翻译系统确定待翻译语句中第一命名实体集包括的每个命名实体的实体类型之后,机器翻译系统还可以对实体类型进行规范化处理。其中,规范处理就是将同一事物的不同描述方式规范为一种描述方式,规范化以后的描述方式可以是书面化的描述方式、或者比较通用的描述方式。比如,将第一命名实体艾米哈里斯的实体类型确定为客户名称时,机器翻译系统可以对客户名称进行规范,将其规范化为客户姓名。
同理,当机器翻译系统确定待翻译语句的源语义模板之后,机器翻译系统还可以对源语义模板进行规范化处理,即将源语义模板规范化为书面化的描述方式、或者比较通用的描述方式。
其中,通过本发明实施例提供的确定命名实体的方法,在标准数据集上对酒店预订的实际翻译效果进行评测,结果如下表9所示。在标准数据集(测试集)上,本发明实施例在命名实体的识别与翻译上均取得很高的准确率(P)和召回率(R),92.59%和96.37%的翻译F值可以确保进入机器翻译系统的命名实体的可信度。
表9
Figure PCTCN2017112384-appb-000007
另外,在标准数据集(测试集)上,通过本发明实施例提供的翻译方法,其翻译结果与谷歌翻译和翻译官翻译相比,其准确率如图8所示。对于英语到汉语的翻译,本发明实施例提供的方法准确率为30.7%,而谷歌翻译和翻译官翻译仅为15%左右;对于汉语到英语的翻译,本发明实施例提供的方法准确率为20.3%,而谷歌翻译和翻译官翻译仅为10%左右。
在本发明实施例中,对于指定对话任务,通过确定第一语言表示的待翻译语句中的第一命名实体集、每个第一命名实体的实体类型,以及根据训练语料训练得到的命名实体对应关系确定与每个第一命名实体对应的第二命名实体,从而可以提高了命名实体翻译的准确率,同时确定待翻译语句的源语义模板,并根据练语料训练得到的语义模板对应关系获取与其对应的目标语义模板,从而可以保证语义模板翻译的准确性,之后基于第二命名实体集和目标语义模板,确定目标翻译语句,即将确定的第二语言 表示的命名实体插入到目标语言模板的相应位置,从而保证了待翻译语句翻译的准确性,使得指定对话任务可以正确顺利的进行。
上述主要从设备执行翻译方法流程的角度对本发明实施例提供的方案进行了介绍。可以理解的是,设备(例如,翻译装置等)为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的设备及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对翻译装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图9示出了上述实施例中所涉及的翻译装置的一种可能的结构示意图,翻译装置300包括:获取单元301、第一确定单元302、第二确定单元303和翻译单元304。其中,获取单元301用于执行图3、图4或图6中的步骤201;第一确定单元302用于执行图3、图4或图6中的步骤202和步骤203;第二确定单元303用于执行图3、图4或图6中的步骤204;翻译单元304用于执行图3、图4或图6中的步骤205。进一步的,翻译装置300还可以包括:训练单元305,和/或显示单元306,和/或更新单元307;其中,训练单元305用于执行图6中的步骤200a和步骤200b;显示单元306用于执行图4和图6中的步骤202a和步骤203a。上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在硬件实现上,上述第一确定单元302、第二确定单元303、翻译单元304、训练单元305和更新单元307可以为处理器,获取单元301可以为输入设备(比如键盘、触摸屏等),显示单元306可以为显示器。
图10所示,为本发明的实施例提供的上述实施例中所涉及的翻译装置310的一种可能的逻辑结构示意图。翻译装置310包括:处理器312、通信接口313、存储器311、输入设备314、显示器315以及总线316。处理器312、通信接口313、存储器311、输入设备314、以及显示器315通过总线316相互连接。在本发明的实施例中,处理器312用于对翻译装置310的动作进行控制管理,例如,处理器312用于执行图3、图4或图6中的步骤202-步骤205,图4或图6中的步骤202a和步骤203a,图6中的步骤200a和步骤200b,和/或用于本文所描述的技术的其他过程。通信接口313用于支持翻译装置310进行通信。存储器311,用于存储翻译装置310的程序代码和数据。输入设备314用于支持翻译装置310的外部输入。显示器315用于支持翻译装置310进行显示。
其中,处理器312可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件 或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线316可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本发明实施例中,对于指定对话任务,翻译装置通过确定第一语言表示的待翻译语句中的第一命名实体集、每个第一命名实体的实体类型,以及根据训练语料训练得到的命名实体对应关系确定与每个第一命名实体对应的第二命名实体,从而可以提高了命名实体翻译的准确率,同时确定待翻译语句的源语义模板,并根据练语料训练得到的语义模板对应关系获取与其对应的目标语义模板,从而可以保证语义模板翻译的准确性,之后基于第二命名实体集和目标语义模板,确定目标翻译语句,即将确定的第二语言表示的命名实体插入到目标语言模板的相应位置,从而保证了待翻译语句翻译的准确性,使得指定对话任务可以正确顺利的进行。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (12)

  1. 一种翻译方法,其特征在于,所述方法包括:
    获取待翻译语句;其中,所述待翻译语句为指定对话任务中第一语言表示的语句;
    确定所述待翻译语句中的第一命名实体集,以及所述第一命名实体集中每个第一命名实体的实体类型;其中,所述第一命名实体集包括至少一个第一命名实体;
    根据所述第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集;其中,所述第二命名实体集包括至少一个第二命名实体,且所述至少一个第二命名实体与所述至少一个第一命名实体对应;
    确定所述待翻译语句的源语义模板,并从所述指定对话任务对应的语义模板对应关系中获取与所述源语义模板对应的目标语义模板;其中,所述语义模板对应关系为所述第一语言表示的语义模板与所述第二语言表示的语义模板之间的对应关系;
    根据所述第二命名实体集和所述目标语义模板,确定目标翻译语句;其中,所述目标翻译语句为所述第二语言表示的与所述待翻译语句对应的翻译后的语句。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集,包括:
    对于所述第一命名实体集中的每个第一命名实体,根据所述第一命名实体的实体类型,从所述指定对话任务对应的命名实体对应关系中获取与所述第一命名实体对应的第二命名实体,从而得到第二命名实体集;其中,所述命名实体对应关系为所述第一语言表示的命名实体与所述第二语言表示的命名实体之间的对应关系。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    根据所述指定对话任务对应的训练语料,确定所述指定对话任务对应的命名实体对应关系;其中,所述训练语料至少包括所述第一语言表示的训练语料、以及与所述第一语言表示的训练语料对应的所述第二语言表示的训练语料。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    根据所述指定对话任务对应的训练语料,确定所述指定对话任务对应的语义模板对应关系;其中,所述训练语料至少包括所述第一语言表示的训练语料、以及与所述第一语言表示的训练语料对应的所述第二语言表示的训练语料。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    显示第一语义信息,所述第一语义信息包括所述第一命名实体集和每个第一命名实体对应的实体类型;和/或,
    显示第二语义信息,所述第二语义信息包括所述第二命名实体集和每个第二命名实体对应的实体类型。
  6. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    若确定所述指定对话任务对应的命名实体对应关系中不存在所述第一命名实体集中的第一命名实体,则根据所述第一命名实体的实体类型,获取与所述第一命名实体对应的所述第二语言表示的第三命名实体;
    根据所述第一命名实体的实体类型、所述第一命名实体和所述第三命名实体,更新所述指定对话任务对应的命名实体对应关系。
  7. 一种翻译装置,其特征在于,所述装置包括:
    获取单元,用于获取待翻译语句;其中,所述待翻译语句为指定对话任务中第一语言表示的语句;
    第一确定单元,用于确定所述待翻译语句中的第一命名实体集,以及所述第一命名实体集中每个第一命名实体的实体类型;其中,所述第一命名实体集包括至少一个第一命名实体;
    所述第一确定单元,还用于根据所述第一命名实体集和每个第一命名实体的实体类型,确定第二语言表示的第二命名实体集;其中,所述第二命名实体集包括至少一个第二命名实体,且所述至少一个第二命名实体与所述至少一个第一命名实体对应;
    第二确定单元,用于确定所述待翻译语句的源语义模板,并从所述指定对话任务对应的语义模板对应关系中获取与所述源语义模板对应的目标语义模板;其中,所述语义模板对应关系为所述第一语言表示的语义模板与所述第二语言表示的语义模板之间的对应关系;
    翻译单元,用于根据所述第二命名实体集和所述目标语义模板,确定目标翻译语句;其中,所述目标翻译语句为所述第二语言表示的与所述待翻译语句对应的翻译后的语句。
  8. 根据权利要求7所述的装置,其特征在于,所述第一确定单元,具体用于:
    对于所述第一命名实体集中的每个第一命名实体,根据所述第一命名实体的实体类型,从所述指定对话任务对应的命名实体对应关系中获取与所述第一命名实体对应的第二命名实体,从而得到第二命名实体集;其中,所述命名实体对应关系为所述第一语言表示的命名实体与所述第二语言表示的命名实体之间的对应关系。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    训练单元,用于根据所述指定对话任务对应的训练语料,确定所述指定对话任务对应的命名实体对应关系;其中,所述训练语料至少包括所述第一语言表示的训练语料、以及与所述第一语言表示的训练语料对应的所述第二语言表示的训练语料。
  10. 根据权利要求7-9任一项所述的装置,其特征在于,所述装置还包括:
    训练单元,用于根据所述指定对话任务对应的训练语料,确定所述指定对话任务对应的语义模板对应关系;其中,所述训练语料至少包括所述第一语言表示的训练语料、以及与所述第一语言表示的训练语料对应的所述第二语言表示的训练语料。
  11. 根据权利要求7-10任一项所述的装置,其特征在于,所述装置还包括:
    显示单元,用于显示第一语义信息,所述第一语义信息包括所述第一命名实体集和每个第一命名实体对应的实体类型;和/或,
    所述显示单元,还用于显示第二语义信息,所述第二语义信息包括所述第二命名实体集和每个第二命名实体对应的实体类型。
  12. 根据权利要求8所述的装置,其特征在于,
    所述获取单元,还用于若确定所述指定对话任务对应的命名实体对应关系中不存在所述第一命名实体集中的第一命名实体,则根据所述第一命名实体的实体类型,获取与所述第一命名实体对应的所述第二语言表示的第三命名实体;
    所述装置还包括:更新单元,用于根据所述第一命名实体的实体类型、所述第一命名实体和所述第三命名实体,更新所述指定对话任务对应的命名实体对应关系。
PCT/CN2017/112384 2017-02-22 2017-11-22 一种翻译方法及装置 WO2018153130A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17897771.6A EP3547163A4 (en) 2017-02-22 2017-11-22 METHOD AND DEVICE FOR TRANSLATION
US16/452,439 US11244108B2 (en) 2017-02-22 2019-06-25 Translation method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710097655.9 2017-02-22
CN201710097655.9A CN108460026B (zh) 2017-02-22 2017-02-22 一种翻译方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/452,439 Continuation US11244108B2 (en) 2017-02-22 2019-06-25 Translation method and apparatus

Publications (1)

Publication Number Publication Date
WO2018153130A1 true WO2018153130A1 (zh) 2018-08-30

Family

ID=63220145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/112384 WO2018153130A1 (zh) 2017-02-22 2017-11-22 一种翻译方法及装置

Country Status (4)

Country Link
US (1) US11244108B2 (zh)
EP (1) EP3547163A4 (zh)
CN (1) CN108460026B (zh)
WO (1) WO2018153130A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287481A (zh) * 2019-05-29 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) 命名实体语料标注训练系统
CN112541365A (zh) * 2020-12-21 2021-03-23 语联网(武汉)信息技术有限公司 基于术语替换的机器翻译方法及装置
CN114328848A (zh) * 2022-03-16 2022-04-12 北京金山数字娱乐科技有限公司 文本处理方法及装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11580129B2 (en) * 2018-04-20 2023-02-14 Microsoft Technology Licensing, Llc Quality-aware data interfaces
CN112074840A (zh) * 2018-05-04 2020-12-11 瑞典爱立信有限公司 以多种语言形式的替代文本来丰富实体的方法和装置
KR20190114938A (ko) * 2019-09-20 2019-10-10 엘지전자 주식회사 다국어 커뮤니케이션을 수행하기 위한 장치 및 방법
CN110888940B (zh) * 2019-10-18 2022-10-25 平安科技(深圳)有限公司 文本信息提取方法、装置、计算机设备及存储介质
CN112836057B (zh) * 2019-11-22 2024-03-26 华为技术有限公司 知识图谱的生成方法、装置、终端以及存储介质
CN111126082A (zh) * 2019-12-03 2020-05-08 北京明略软件系统有限公司 一种翻译方法及装置
CN111476035B (zh) * 2020-05-06 2023-09-05 中国人民解放军国防科技大学 中文开放关系预测方法、装置、计算机设备和存储介质
CN111738024B (zh) * 2020-07-29 2023-10-27 腾讯科技(深圳)有限公司 实体名词标注方法和装置、计算设备和可读存储介质
US11853699B2 (en) * 2021-01-29 2023-12-26 Salesforce.Com, Inc. Synthetic crafting of training and test data for named entity recognition by utilizing a rule-based library

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251840A (zh) * 2008-04-17 2008-08-27 北京百问百答网络技术有限公司 一种基于语义模板的问题自动翻译方法及其系统
CN101419592A (zh) * 2007-10-26 2009-04-29 英业达股份有限公司 电脑可执行的网络新用语翻译系统及其方法
CN102662937A (zh) * 2012-04-12 2012-09-12 传神联合(北京)信息技术有限公司 自动翻译系统及其自动翻译方法
CN103853710A (zh) * 2013-11-21 2014-06-11 北京理工大学 一种基于协同训练的双语命名实体识别方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643511A (zh) * 2002-03-11 2005-07-20 南加利福尼亚大学 命名实体翻译
US8612987B2 (en) * 2007-01-15 2013-12-17 International Business Machines Corporation Prediction-based resource matching for grid environments
US20090094018A1 (en) * 2007-10-08 2009-04-09 Nokia Corporation Flexible Phrasebook
CN101369265A (zh) * 2008-01-14 2009-02-18 北京百问百答网络技术有限公司 一种自动生成问题的语义模板的方法和系统
WO2010046782A2 (en) * 2008-10-24 2010-04-29 App Tek Hybrid machine translation
WO2010051966A1 (en) * 2008-11-07 2010-05-14 Lingupedia Investments Sarl Method for semantic processing of natural language using graphical interlingua
EP2261818A1 (en) * 2009-06-09 2010-12-15 Dudu Communications FZ-LLC A method for inter-lingual electronic communication
US8645372B2 (en) * 2009-10-30 2014-02-04 Evri, Inc. Keyword-based search engine results using enhanced query strategies
CN102654866A (zh) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 例句索引创建方法和装置以及例句检索方法和装置
US20130124545A1 (en) * 2011-11-15 2013-05-16 Business Objects Software Limited System and method implementing a text analysis repository
US9613026B2 (en) * 2011-12-28 2017-04-04 Bloomberg Finance L.P. System and method for interactive automatic translation
US9613027B2 (en) * 2013-11-07 2017-04-04 Microsoft Technology Licensing, Llc Filled translation for bootstrapping language understanding of low-resourced languages
US9721002B2 (en) * 2013-11-29 2017-08-01 Sap Se Aggregating results from named entity recognition services
JP5850512B2 (ja) * 2014-03-07 2016-02-03 国立研究開発法人情報通信研究機構 単語アライメントスコア算出装置、単語アライメント装置、及びコンピュータプログラム
CN106294308B (zh) * 2015-05-19 2020-06-30 深圳市腾讯计算机系统有限公司 命名实体识别方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419592A (zh) * 2007-10-26 2009-04-29 英业达股份有限公司 电脑可执行的网络新用语翻译系统及其方法
CN101251840A (zh) * 2008-04-17 2008-08-27 北京百问百答网络技术有限公司 一种基于语义模板的问题自动翻译方法及其系统
CN102662937A (zh) * 2012-04-12 2012-09-12 传神联合(北京)信息技术有限公司 自动翻译系统及其自动翻译方法
CN103853710A (zh) * 2013-11-21 2014-06-11 北京理工大学 一种基于协同训练的双语命名实体识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3547163A4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287481A (zh) * 2019-05-29 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) 命名实体语料标注训练系统
CN110287481B (zh) * 2019-05-29 2022-06-14 西南电子技术研究所(中国电子科技集团公司第十研究所) 命名实体语料标注训练系统
CN112541365A (zh) * 2020-12-21 2021-03-23 语联网(武汉)信息技术有限公司 基于术语替换的机器翻译方法及装置
CN112541365B (zh) * 2020-12-21 2024-05-10 语联网(武汉)信息技术有限公司 基于术语替换的机器翻译方法及装置
CN114328848A (zh) * 2022-03-16 2022-04-12 北京金山数字娱乐科技有限公司 文本处理方法及装置

Also Published As

Publication number Publication date
EP3547163A1 (en) 2019-10-02
CN108460026A (zh) 2018-08-28
EP3547163A4 (en) 2020-01-15
US11244108B2 (en) 2022-02-08
US20190311038A1 (en) 2019-10-10
CN108460026B (zh) 2021-02-12

Similar Documents

Publication Publication Date Title
WO2018153130A1 (zh) 一种翻译方法及装置
US9805718B2 (en) Clarifying natural language input using targeted questions
US10515147B2 (en) Using statistical language models for contextual lookup
WO2020177592A1 (zh) 画作问答方法及装置、画作问答系统、可读存储介质
WO2021134524A1 (zh) 数据处理方法、装置、电子设备和存储介质
JPWO2018055983A1 (ja) 翻訳装置、翻訳システム、および評価サーバ
JP7395553B2 (ja) 文章翻訳方法、装置、電子機器及び記憶媒体
WO2019109663A1 (zh) 一种跨语言搜索方法和装置、一种用于跨语言搜索的装置
JP2019212289A (ja) 情報を生成するための方法及び装置
US11574135B2 (en) Method, apparatus, electronic device and readable storage medium for translation
US11017015B2 (en) System for creating interactive media and method of operating the same
WO2021082070A1 (zh) 智能对话方法及相关设备
CN110889295A (zh) 机器翻译模型、伪专业平行语料的确定方法、系统及设备
JP7208968B2 (ja) 情報処理方法、装置および記憶媒体
CN110546634A (zh) 翻译装置
WO2023103943A1 (zh) 图片处理方法、装置及电子设备
TWI376656B (en) Foreign-language learning method utilizing an original language to review corresponding foreign languages and foreign-language learning database system thereof
US20230112385A1 (en) Method of obtaining event information, electronic device, and storage medium
CN110888940A (zh) 文本信息提取方法、装置、计算机设备及存储介质
US11928437B2 (en) Machine reading between the lines
WO2022161307A1 (zh) 文本翻译方法、装置、设备及介质
US20220027558A1 (en) Method and system for extracting keywords from text
CN114330311A (zh) 一种翻译方法、装置、电子设备和计算机可读存储介质
WO2017201904A1 (zh) 搜索方法、装置、设备及非易失性计算机存储介质
JP6110539B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17897771

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017897771

Country of ref document: EP

Effective date: 20190628

NENP Non-entry into the national phase

Ref country code: DE