WO2020087655A1 - Procédé, appareil et dispositif de traduction, et support de stockage lisible - Google Patents
Procédé, appareil et dispositif de traduction, et support de stockage lisible Download PDFInfo
- Publication number
- WO2020087655A1 WO2020087655A1 PCT/CN2018/119329 CN2018119329W WO2020087655A1 WO 2020087655 A1 WO2020087655 A1 WO 2020087655A1 CN 2018119329 W CN2018119329 W CN 2018119329W WO 2020087655 A1 WO2020087655 A1 WO 2020087655A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- source language
- training
- translation
- language training
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Definitions
- the text segmentation model is obtained by training the source language training text as the training data, and using the sentence segmentation result of the source language training text that matches the current translation scene as the training label.
- the sentence breaking method of the source language training text is changed to obtain the changed source language training text, and the candidate source language training is composed of the changed source language training text and the source language training text text;
- the use of a preset machine translation model to translate each candidate source language training text to obtain a machine translation result of each candidate source language training text includes:
- the translation of the source language text after the sentence segmentation to obtain the target language text includes:
- a text segmentation model determination unit which is used to determine a text segmentation model
- the text segmentation model includes:
- the sentence segmentation result determination unit includes:
- a non-terminating punctuation determining unit configured to determine the non-terminating punctuation included in the source language training text
- a second model training unit configured to use the source language training text as training data and the artificially labeled source language training text as training labels to train a text segmentation model to obtain a preliminary text segmentation model;
- the second clause translation unit is used to translate each clause in the clause sequence of the source language text after the sentence segmentation by using a preset machine translation model to obtain a machine translation result of each clause;
- the sentence breaking method in the source language text (that is, the punctuation in the source language text) obtained in the previous step may be affected by the speaker's speaking habits.
- the sentence breaking method is not standardized and the current translation scenario is not considered. If you directly translate the obtained source language text, the quality of the translation result is not high.
- the process of sentence segmentation processing of the source language text is added, and the sentence segmentation processing process takes into account the current translation scenario, so that the sentence segmentation method of the source language text after the sentence segmentation is more in line with the current translation scenario.
- the embodiments of the present application can also choose to synthesize the target language text into speech according to the needs of the user, and then perform speech broadcasting to realize the conversion process from the source language speech to the target language speech.
- the embodiments of the present application also provide another processing method for sentence segmentation of the source language text, that is, a process of sentence segmentation of the source language text can be performed using a machine learning model.
- a process of sentence segmentation of the source language text can be performed using a machine learning model. The detailed process is as follows:
- the machine learning model for sentence segmentation processing in this embodiment is defined as a text sentence segmentation model, which can use existing machine learning models of various structures, such as the BLSTM model under the sequence annotation framework, the Self-Attention model, etc., or the codec
- the sequence generation model under the Encode-Decode framework can also use a combination of existing multiple structural models.
- a part of non-terminating punctuation can be converted into terminating punctuation, the occurrence probability of terminating punctuation will increase, and in the machine translation process, it is a translation based on the content before terminating punctuation Therefore, according to the application scheme, the time for waiting for termination punctuation will be shortened, thereby increasing the output speed of translation results, reducing the subjective time for users to wait for translation results, and improving the user experience.
- the first model training unit is used to train the text segmentation model by using the source language training text as training data and the target sentence segmentation result as a training label.
- a manual labeling result obtaining unit which is used to obtain a result of manually punctuating the source language training text to obtain the source language training text after manual labeling;
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne un procédé, un appareil et un dispositif de traduction, et un support de stockage lisible. Le procédé comprend les étapes consistant à : obtenir un texte en langue source à traduire ; et effectuer une segmentation de phrases sur le texte en langue source davantage en fonction de la scène de traduction actuelle, de telle sorte que le texte en langue source obtenu après la segmentation de phrases se conforme mieux à la scène de traduction actuelle. Bien entendu, par comparaison avec le procédé de traduction existant, la présente invention ajoute le processus d'optimisation de segmentation de phrases au texte en langue source obtenu, à savoir, le mode de segmentation de phrases du texte en langue source est plus optimisé en considérant la situation où la segmentation de phrases est effectuée sur le texte en langue source à nouveau dans la scène de traduction actuelle, et sur cette base, le texte en langue source après la segmentation de phrases est traduit, de telle sorte que la qualité du texte en langue cible obtenu est plus élevée.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811276866.XA CN109408833A (zh) | 2018-10-30 | 2018-10-30 | 一种翻译方法、装置、设备及可读存储介质 |
CN201811276866.X | 2018-10-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020087655A1 true WO2020087655A1 (fr) | 2020-05-07 |
Family
ID=65470039
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/119329 WO2020087655A1 (fr) | 2018-10-30 | 2018-12-05 | Procédé, appareil et dispositif de traduction, et support de stockage lisible |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109408833A (fr) |
WO (1) | WO2020087655A1 (fr) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321532A (zh) * | 2019-06-06 | 2019-10-11 | 数译(成都)信息技术有限公司 | 语言预处理断句方法、计算机设备及计算机可读存储介质 |
CN112084795A (zh) * | 2019-06-12 | 2020-12-15 | 阿里巴巴集团控股有限公司 | 一种翻译系统和翻译服务调用的方法、装置 |
CN110232194B (zh) * | 2019-06-17 | 2024-04-09 | 安徽听见科技有限公司 | 翻译显示方法、装置、设备及可读存储介质 |
CN112151019B (zh) * | 2019-06-26 | 2024-09-20 | 阿里巴巴集团控股有限公司 | 文本处理方法、装置及计算设备 |
CN113591491B (zh) * | 2020-04-30 | 2023-12-26 | 阿里巴巴集团控股有限公司 | 语音翻译文本校正系统、方法、装置及设备 |
CN111611811B (zh) * | 2020-05-25 | 2023-01-13 | 腾讯科技(深圳)有限公司 | 翻译方法、装置、电子设备及计算机可读存储介质 |
CN111654658B (zh) * | 2020-06-17 | 2022-04-15 | 平安科技(深圳)有限公司 | 音视频通话的处理方法、系统、编解码器及存储装置 |
CN112232091B (zh) * | 2020-10-14 | 2021-11-16 | 文思海辉智科科技有限公司 | 一种内容匹配的方法及装置、可读存储介质 |
CN112560510B (zh) * | 2020-12-10 | 2023-12-01 | 科大讯飞股份有限公司 | 翻译模型训练方法、装置、设备及存储介质 |
CN112668346B (zh) * | 2020-12-24 | 2024-04-30 | 中国科学技术大学 | 翻译方法、装置、设备及存储介质 |
CN113392657A (zh) * | 2021-06-18 | 2021-09-14 | 北京爱奇艺科技有限公司 | 训练样本增强方法、装置、计算机设备和存储介质 |
CN113378586B (zh) * | 2021-07-15 | 2023-03-28 | 北京有竹居网络技术有限公司 | 语音翻译方法、翻译模型训练方法、装置、介质及设备 |
CN113660432B (zh) * | 2021-08-17 | 2024-05-28 | 安徽听见科技有限公司 | 翻译字幕制作方法、装置、电子设备与存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030055626A1 (en) * | 2001-09-19 | 2003-03-20 | International Business Machines Corporation | Sentence segmentation method and sentence segmentation apparatus, machine translation system, and program product using sentence segmentation method |
CN101458681A (zh) * | 2007-12-10 | 2009-06-17 | 株式会社东芝 | 语音翻译方法和语音翻译装置 |
CN103530284A (zh) * | 2013-09-22 | 2014-01-22 | 中国专利信息中心 | 短句切分装置、机器翻译系统及对应切分方法和翻译方法 |
CN107247706A (zh) * | 2017-06-16 | 2017-10-13 | 中国电子技术标准化研究院 | 文本断句模型建立方法、断句方法、装置及计算机设备 |
CN108628819A (zh) * | 2017-03-16 | 2018-10-09 | 北京搜狗科技发展有限公司 | 处理方法和装置、用于处理的装置 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10303777B2 (en) * | 2016-08-08 | 2019-05-28 | Netflix, Inc. | Localization platform that leverages previously translated content |
-
2018
- 2018-10-30 CN CN201811276866.XA patent/CN109408833A/zh active Pending
- 2018-12-05 WO PCT/CN2018/119329 patent/WO2020087655A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030055626A1 (en) * | 2001-09-19 | 2003-03-20 | International Business Machines Corporation | Sentence segmentation method and sentence segmentation apparatus, machine translation system, and program product using sentence segmentation method |
CN101458681A (zh) * | 2007-12-10 | 2009-06-17 | 株式会社东芝 | 语音翻译方法和语音翻译装置 |
CN103530284A (zh) * | 2013-09-22 | 2014-01-22 | 中国专利信息中心 | 短句切分装置、机器翻译系统及对应切分方法和翻译方法 |
CN108628819A (zh) * | 2017-03-16 | 2018-10-09 | 北京搜狗科技发展有限公司 | 处理方法和装置、用于处理的装置 |
CN107247706A (zh) * | 2017-06-16 | 2017-10-13 | 中国电子技术标准化研究院 | 文本断句模型建立方法、断句方法、装置及计算机设备 |
Also Published As
Publication number | Publication date |
---|---|
CN109408833A (zh) | 2019-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020087655A1 (fr) | Procédé, appareil et dispositif de traduction, et support de stockage lisible | |
US20210280190A1 (en) | Human-machine interaction | |
CN105869629B (zh) | 语音识别方法及装置 | |
US20200193217A1 (en) | Method for determining sentence similarity | |
WO2018157703A1 (fr) | Procédé et dispositif d'extraction sémantique de langage naturel et support de stockage informatique | |
WO2019232991A1 (fr) | Procédé de reconnaissance de voix de conférence sous forme de texte, dispositif électronique et support de stockage | |
CN107301170B (zh) | 基于人工智能的切分语句的方法和装置 | |
CN111402861B (zh) | 一种语音识别方法、装置、设备及存储介质 | |
CN110415680B (zh) | 一种同声传译方法、同声传译装置以及一种电子设备 | |
CN109976702A (zh) | 一种语音识别方法、装置及终端 | |
WO2020103447A1 (fr) | Procédé et appareil de stockage de type à liaison pour les informations vidéo, dispositif informatique et support d'enregistrement | |
CN113536007A (zh) | 一种虚拟形象生成方法、装置、设备以及存储介质 | |
CN112560510A (zh) | 翻译模型训练方法、装置、设备及存储介质 | |
WO2021159655A1 (fr) | Procédé, appareil et dispositif de remplissage d'attribut de données et support de stockage lisible par ordinateur | |
CN110633475A (zh) | 基于计算机场景的自然语言理解方法、装置、系统和存储介质 | |
CN110728983B (zh) | 一种信息显示方法、装置、设备及可读存储介质 | |
CN112101003B (zh) | 语句文本的切分方法、装置、设备和计算机可读存储介质 | |
WO2020199590A1 (fr) | Procédé d'analyse de détection d'humeur et dispositif associé | |
KR20190074508A (ko) | 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법 | |
CN109408621B (zh) | 对话情感分析方法和系统 | |
CN112530417A (zh) | 语音信号处理方法、装置、电子设备及存储介质 | |
CN113553833B (zh) | 文本纠错的方法、装置及电子设备 | |
CN110162794A (zh) | 一种分词的方法及服务器 | |
CN113851106B (zh) | 音频播放方法、装置、电子设备和可读存储介质 | |
WO2022267451A1 (fr) | Procédé de reconnaissance automatique de la parole basé sur un réseau neuronal, dispositif et support de stockage lisible |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18939010 Country of ref document: EP Kind code of ref document: A1 |